贴两个不错的链接:
http://blog.csdn.net/azhao_dn/article/details/7070327
http://blog.csdn.net/xhh198781/article/details/7573842
在项目中一共起了四个队列,调度来自oozie 统计,搭建物理模型,etl服务的请求;各设置25。
1.修改mapred-site.xml
<property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.CapacityTaskScheduler</value> </property> <property> <name>mapred.queue.names</name> <value>default,general,etl,day</value> </property>
2.创建capacity-scheduler.xml
<?xml version="1.0"?> <!-- This is the configuration file for the resource manager in Hadoop. --> <!-- You can configure various scheduling parameters related to queues. --> <!-- The properties for a queue follow a naming convention,such as, --> <!-- mapred.capacity-scheduler.queue.<queue-name>.property-name. --> <configuration> <property> <name>mapred.capacity-scheduler.maximum-system-jobs</name> <value>3000</value> <description>Maximum number of jobs in the system which can be initialized, concurrently, by the CapacityScheduler. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.capacity</name> <value>20</value> <description>Percentage of the number of slots in the cluster that are to be available for jobs in this queue. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.maximum-capacity</name> <value>-1</value> <description> maximum-capacity defines a limit beyond which a queue cannot use the capacity of the cluster. This provides a means to limit how much excess capacity a queue can use. By default, there is no limit. The maximum-capacity of a queue can only be greater than or equal to its minimum capacity. Default value of -1 implies a queue can use complete capacity of the cluster. This property could be to curtail certain jobs which are long running in nature from occupying more than a certain percentage of the cluster, which in the absence of pre-emption, could lead to capacity guarantees of other queues being affected. One important thing to note is that maximum-capacity is a percentage , so based on the cluster's capacity the max capacity would change. So if large no of nodes or racks get added to the cluster , max Capacity in absolute terms would increase accordingly. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.supports-priority</name> <value>false</value> <description>If true, priorities of jobs will be taken into account in scheduling decisions. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.minimum-user-limit-percent</name> <value>100</value> <description> Each queue enforces a limit on the percentage of resources allocated to a user at any given time, if there is competition for them. This user limit can vary between a minimum and maximum value. The former depends on the number of users who have submitted jobs, and the latter is set to this property value. For example, suppose the value of this property is 25. If two users have submitted jobs to a queue, no single user can use more than 50% of the queue resources. If a third user submits a job, no single user can use more than 33% of the queue resources. With 4 or more users, no user can use more than 25% of the queue's resources. A value of 100 implies no user limits are imposed. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.user-limit-factor</name> <value>1</value> <description>The multiple of the queue capacity which can be configured to allow a single user to acquire more slots. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks</name> <value>200000</value> <description>The maximum number of tasks, across all jobs in the queue, which can be initialized concurrently. Once the queue's jobs exceed this limit they will be queued on disk. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks-per-user</name> <value>100000</value> <description>The maximum number of tasks per-user, across all the of the user's jobs in the queue, which can be initialized concurrently. Once the user's jobs exceed this limit they will be queued on disk. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.init-accept-jobs-factor</name> <value>10</value> <description>The multipe of (maximum-system-jobs * queue-capacity) used to determine the number of jobs which are accepted by the scheduler. </description> </property> <!-- The default configuration settings for the capacity task scheduler --> <!-- The default values would be applied to all the queues which don't have --> <!-- the appropriate property for the particular queue --> <property> <name>mapred.capacity-scheduler.default-supports-priority</name> <value>false</value> <description>If true, priorities of jobs will be taken into account in scheduling decisions by default in a job queue. </description> </property> <property> <name>mapred.capacity-scheduler.default-minimum-user-limit-percent</name> <value>100</value> <description>The percentage of the resources limited to a particular user for the job queue at any given point of time by default. </description> </property> <property> <name>mapred.capacity-scheduler.default-user-limit-factor</name> <value>1</value> <description>The default multiple of queue-capacity which is used to determine the amount of slots a single user can consume concurrently. </description> </property> <property> <name>mapred.capacity-scheduler.default-maximum-active-tasks-per-queue</name> <value>200000</value> <description>The default maximum number of tasks, across all jobs in the queue, which can be initialized concurrently. Once the queue's jobs exceed this limit they will be queued on disk. </description> </property> <property> <name>mapred.capacity-scheduler.default-maximum-active-tasks-per-user</name> <value>100000</value> <description>The default maximum number of tasks per-user, across all the of the user's jobs in the queue, which can be initialized concurrently. Once the user's jobs exceed this limit they will be queued on disk. </description> </property> <property> <name>mapred.capacity-scheduler.default-init-accept-jobs-factor</name> <value>10</value> <description>The default multipe of (maximum-system-jobs * queue-capacity) used to determine the number of jobs which are accepted by the scheduler. </description> </property> <!-- Capacity scheduler Job Initialization configuration parameters --> <property> <name>mapred.capacity-scheduler.init-poll-interval</name> <value>5000</value> <description>The amount of time in miliseconds which is used to poll the job queues for jobs to initialize. </description> </property> <property> <name>mapred.capacity-scheduler.init-worker-threads</name> <value>5</value> <description>Number of worker threads which would be used by Initialization poller to initialize jobs in a set of queue. If number mentioned in property is equal to number of job queues then a single thread would initialize jobs in a queue. If lesser then a thread would get a set of queues assigned. If the number is greater then number of threads would be equal to number of job queues. </description> </property> <!-- defualt --> <property> <name>mapred.capacity-scheduler.queue.defualt.capacity</name> <value>25</value> </property> <property> <name>mapred.capacity-scheduler.queue.defualt.maximum-capacity</name> <value>80</value> </property> <property> <name>mapred.capacity-scheduler.queue.defualt.supports-priority</name> <value>false</value> </property> <property> <name>mapred.capacity-scheduler.queue.defualt.minimum-user-limit-percent</name> <value>20</value> </property> <property> <name>mapred.capacity-scheduler.queue.defualt.user-limit-factor</name> <value>10</value> </property> <property> <name>mapred.capacity-scheduler.queue.defualt.maximum-initialized-active-tasks</name> <value>200000</value> </property> <property> <name>mapred.capacity-scheduler.queue.defualt.maximum-initialized-active-tasks-per-user</name> <value>100000</value> </property> <property> <name>mapred.capacity-scheduler.queue.defualt.init-accept-jobs-factor</name> <value>100</value> </property> <!-- etl --> <property> <name>mapred.capacity-scheduler.queue.etl.capacity</name> <value>25</value> </property> <property> <name>mapred.capacity-scheduler.queue.etl.maximum-capacity</name> <value>80</value> </property> <property> <name>mapred.capacity-scheduler.queue.etl.supports-priority</name> <value>false</value> </property> <property> <name>mapred.capacity-scheduler.queue.etl.minimum-user-limit-percent</name> <value>20</value> </property> <property> <name>mapred.capacity-scheduler.queue.etl.user-limit-factor</name> <value>10</value> </property> <property> <name>mapred.capacity-scheduler.queue.etl.maximum-initialized-active-tasks</name> <value>200000</value> </property> <property> <name>mapred.capacity-scheduler.queue.etl.maximum-initialized-active-tasks-per-user</name> <value>100000</value> </property> <property> <name>mapred.capacity-scheduler.queue.etl.init-accept-jobs-factor</name> <value>100</value> </property> <!-- day --> <property> <name>mapred.capacity-scheduler.queue.day.capacity</name> <value>25</value> </property> <property> <name>mapred.capacity-scheduler.queue.day.maximum-capacity</name> <value>80</value> </property> <property> <name>mapred.capacity-scheduler.queue.day.supports-priority</name> <value>false</value> </property> <property> <name>mapred.capacity-scheduler.queue.day.minimum-user-limit-percent</name> <value>20</value> </property> <property> <name>mapred.capacity-scheduler.queue.day.user-limit-factor</name> <value>10</value> </property> <property> <name>mapred.capacity-scheduler.queue.day.maximum-initialized-active-tasks</name> <value>200000</value> </property> <property> <name>mapred.capacity-scheduler.queue.day.maximum-initialized-active-tasks-per-user</name> <value>100000</value> </property> <property> <name>mapred.capacity-scheduler.queue.day.init-accept-jobs-factor</name> <value>100</value> </property> <!-- general --> <property> <name>mapred.capacity-scheduler.queue.general.capacity</name> <value>25</value> </property> <property> <name>mapred.capacity-scheduler.queue.general.maximum-capacity</name> <value>80</value> </property> <property> <name>mapred.capacity-scheduler.queue.general.supports-priority</name> <value>false</value> </property> <property> <name>mapred.capacity-scheduler.queue.general.minimum-user-limit-percent</name> <value>20</value> </property> <property> <name>mapred.capacity-scheduler.queue.general.user-limit-factor</name> <value>10</value> </property> <property> <name>mapred.capacity-scheduler.queue.general.maximum-initialized-active-tasks</name> <value>200000</value> </property> <property> <name>mapred.capacity-scheduler.queue.general.maximum-initialized-active-tasks-per-user</name> <value>100000</value> </property> <property> <name>mapred.capacity-scheduler.queue.general.init-accept-jobs-factor</name> <value>100</value> </property> </configuration>
3.拷贝jar包到hadoop(有jobtracker)的lib下hadoop-capacity-scheduler-0.20.203.0.jar
相关推荐
Hadoop任务调度器是Hadoop分布式计算框架中的核心组件之一,负责管理和分配集群资源,以实现任务的高效执行。Hadoop的作业调度过程可以划分为几个主要阶段,这些阶段涉及到从作业提交到任务分配的各个环节。下面详细...
然而,随着HADOOP-3412的bug报告,Hadoop引入了可插入式调度器框架,使得不同的调度算法得以应用,提高了系统的适应性和可靠性。 可插入式调度器框架允许Hadoop支持各种调度策略,以满足不同类型的作业需求。两种...
《深入理解Hadoop容量调度器》 在Hadoop生态系统中,资源调度是集群高效运行的关键。容量调度器(Capacity Scheduler)作为Hadoop MapReduce的重要组件,旨在确保多用户、多任务公平共享集群资源。它通过设定不同的...
HADOOP 工作调度器介绍 HADOOP 作为大数据时代的重要技术之一,已经被广泛应用于各个行业中,其中任务调度器是 HADOOP 的一个重要组件。本文将介绍 HADOOP 工作调度器的相关知识点,主要基于 FACEBOOK 的使用经验。...
默认调度器FIFO(First-In-First-Out)。FIFO将所有用户提交的作业都放到一个队列中,以便按照提交的顺序来执行任务。该算法简单易于实现,但是存在以下缺陷: * 无法考虑资源负载均衡问题 * 无法实现任务间的并行...
### Hadoop公平调度器深入解析 #### 目的与引言 本文旨在深入解析Hadoop中的公平调度器(Fair Scheduler),这一组件为Hadoop提供了一种先进的资源共享机制,尤其适用于大规模集群环境。与Hadoop传统的调度策略...
### Hadoop公平调度器知识点详解 #### 一、公平调度器概述 **公平调度器**(Fair Scheduler)是Hadoop中的一种插件式Map/Reduce调度器,它为大规模集群提供了一种有效的资源共享机制。其核心目标是确保随着时间的...
针对当前城市交通信息数据量庞大且复杂、地理分布广泛的问题,文章《交通信息分布式处理中的Hadoop调度算法优化》介绍了通过改进的计算能力调度算法,对海量数据进行有效挖掘,服务于城市智能交通。文章由孙卫真、...
Hadoop提供了一种默认的调度算法,即**FIFO(First-In-First-Out)调度器**。这种调度器按照作业提交的时间顺序来执行作业,简单易用,但在处理不同类型和优先级的作业时可能会导致资源浪费和不公平的问题。 随着...
"Hadoop技术-YARN资源调度器" Hadoop技术作为大数据处理的重要技术之一,YARN(Yet Another Resource Negotiator)作为Hadoop的资源管理层,负责管理和分配集群资源。YARN资源调度器是Hadoop YARN中最核心的组件之...
HADOOP公平调度器算法解析.doc
在分布式计算领域,Hadoop作为一个开源的框架,用于处理和存储大规模数据,其调度算法是决定系统效率和公平性的重要组成部分。本篇文章将详细介绍Hadoop中的几种常见调度算法,包括FIFO(先进先出)、公平调度算法...
### 基于Hadoop的调度算法研究与实现 #### 一、研究背景与意义 随着大数据时代的到来,海量数据的处理成为各个领域面临的重要挑战。Hadoop作为一款开源的大数据处理框架,因其具备良好的扩展性和高容错性,在大...