贴两个不错的链接:
http://blog.csdn.net/azhao_dn/article/details/7070327
http://blog.csdn.net/xhh198781/article/details/7573842
在项目中一共起了四个队列,调度来自oozie 统计,搭建物理模型,etl服务的请求;各设置25。
1.修改mapred-site.xml
<property> <name>mapred.jobtracker.taskScheduler</name> <value>org.apache.hadoop.mapred.CapacityTaskScheduler</value> </property> <property> <name>mapred.queue.names</name> <value>default,general,etl,day</value> </property>
2.创建capacity-scheduler.xml
<?xml version="1.0"?> <!-- This is the configuration file for the resource manager in Hadoop. --> <!-- You can configure various scheduling parameters related to queues. --> <!-- The properties for a queue follow a naming convention,such as, --> <!-- mapred.capacity-scheduler.queue.<queue-name>.property-name. --> <configuration> <property> <name>mapred.capacity-scheduler.maximum-system-jobs</name> <value>3000</value> <description>Maximum number of jobs in the system which can be initialized, concurrently, by the CapacityScheduler. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.capacity</name> <value>20</value> <description>Percentage of the number of slots in the cluster that are to be available for jobs in this queue. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.maximum-capacity</name> <value>-1</value> <description> maximum-capacity defines a limit beyond which a queue cannot use the capacity of the cluster. This provides a means to limit how much excess capacity a queue can use. By default, there is no limit. The maximum-capacity of a queue can only be greater than or equal to its minimum capacity. Default value of -1 implies a queue can use complete capacity of the cluster. This property could be to curtail certain jobs which are long running in nature from occupying more than a certain percentage of the cluster, which in the absence of pre-emption, could lead to capacity guarantees of other queues being affected. One important thing to note is that maximum-capacity is a percentage , so based on the cluster's capacity the max capacity would change. So if large no of nodes or racks get added to the cluster , max Capacity in absolute terms would increase accordingly. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.supports-priority</name> <value>false</value> <description>If true, priorities of jobs will be taken into account in scheduling decisions. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.minimum-user-limit-percent</name> <value>100</value> <description> Each queue enforces a limit on the percentage of resources allocated to a user at any given time, if there is competition for them. This user limit can vary between a minimum and maximum value. The former depends on the number of users who have submitted jobs, and the latter is set to this property value. For example, suppose the value of this property is 25. If two users have submitted jobs to a queue, no single user can use more than 50% of the queue resources. If a third user submits a job, no single user can use more than 33% of the queue resources. With 4 or more users, no user can use more than 25% of the queue's resources. A value of 100 implies no user limits are imposed. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.user-limit-factor</name> <value>1</value> <description>The multiple of the queue capacity which can be configured to allow a single user to acquire more slots. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks</name> <value>200000</value> <description>The maximum number of tasks, across all jobs in the queue, which can be initialized concurrently. Once the queue's jobs exceed this limit they will be queued on disk. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.maximum-initialized-active-tasks-per-user</name> <value>100000</value> <description>The maximum number of tasks per-user, across all the of the user's jobs in the queue, which can be initialized concurrently. Once the user's jobs exceed this limit they will be queued on disk. </description> </property> <property> <name>mapred.capacity-scheduler.queue.default.init-accept-jobs-factor</name> <value>10</value> <description>The multipe of (maximum-system-jobs * queue-capacity) used to determine the number of jobs which are accepted by the scheduler. </description> </property> <!-- The default configuration settings for the capacity task scheduler --> <!-- The default values would be applied to all the queues which don't have --> <!-- the appropriate property for the particular queue --> <property> <name>mapred.capacity-scheduler.default-supports-priority</name> <value>false</value> <description>If true, priorities of jobs will be taken into account in scheduling decisions by default in a job queue. </description> </property> <property> <name>mapred.capacity-scheduler.default-minimum-user-limit-percent</name> <value>100</value> <description>The percentage of the resources limited to a particular user for the job queue at any given point of time by default. </description> </property> <property> <name>mapred.capacity-scheduler.default-user-limit-factor</name> <value>1</value> <description>The default multiple of queue-capacity which is used to determine the amount of slots a single user can consume concurrently. </description> </property> <property> <name>mapred.capacity-scheduler.default-maximum-active-tasks-per-queue</name> <value>200000</value> <description>The default maximum number of tasks, across all jobs in the queue, which can be initialized concurrently. Once the queue's jobs exceed this limit they will be queued on disk. </description> </property> <property> <name>mapred.capacity-scheduler.default-maximum-active-tasks-per-user</name> <value>100000</value> <description>The default maximum number of tasks per-user, across all the of the user's jobs in the queue, which can be initialized concurrently. Once the user's jobs exceed this limit they will be queued on disk. </description> </property> <property> <name>mapred.capacity-scheduler.default-init-accept-jobs-factor</name> <value>10</value> <description>The default multipe of (maximum-system-jobs * queue-capacity) used to determine the number of jobs which are accepted by the scheduler. </description> </property> <!-- Capacity scheduler Job Initialization configuration parameters --> <property> <name>mapred.capacity-scheduler.init-poll-interval</name> <value>5000</value> <description>The amount of time in miliseconds which is used to poll the job queues for jobs to initialize. </description> </property> <property> <name>mapred.capacity-scheduler.init-worker-threads</name> <value>5</value> <description>Number of worker threads which would be used by Initialization poller to initialize jobs in a set of queue. If number mentioned in property is equal to number of job queues then a single thread would initialize jobs in a queue. If lesser then a thread would get a set of queues assigned. If the number is greater then number of threads would be equal to number of job queues. </description> </property> <!-- defualt --> <property> <name>mapred.capacity-scheduler.queue.defualt.capacity</name> <value>25</value> </property> <property> <name>mapred.capacity-scheduler.queue.defualt.maximum-capacity</name> <value>80</value> </property> <property> <name>mapred.capacity-scheduler.queue.defualt.supports-priority</name> <value>false</value> </property> <property> <name>mapred.capacity-scheduler.queue.defualt.minimum-user-limit-percent</name> <value>20</value> </property> <property> <name>mapred.capacity-scheduler.queue.defualt.user-limit-factor</name> <value>10</value> </property> <property> <name>mapred.capacity-scheduler.queue.defualt.maximum-initialized-active-tasks</name> <value>200000</value> </property> <property> <name>mapred.capacity-scheduler.queue.defualt.maximum-initialized-active-tasks-per-user</name> <value>100000</value> </property> <property> <name>mapred.capacity-scheduler.queue.defualt.init-accept-jobs-factor</name> <value>100</value> </property> <!-- etl --> <property> <name>mapred.capacity-scheduler.queue.etl.capacity</name> <value>25</value> </property> <property> <name>mapred.capacity-scheduler.queue.etl.maximum-capacity</name> <value>80</value> </property> <property> <name>mapred.capacity-scheduler.queue.etl.supports-priority</name> <value>false</value> </property> <property> <name>mapred.capacity-scheduler.queue.etl.minimum-user-limit-percent</name> <value>20</value> </property> <property> <name>mapred.capacity-scheduler.queue.etl.user-limit-factor</name> <value>10</value> </property> <property> <name>mapred.capacity-scheduler.queue.etl.maximum-initialized-active-tasks</name> <value>200000</value> </property> <property> <name>mapred.capacity-scheduler.queue.etl.maximum-initialized-active-tasks-per-user</name> <value>100000</value> </property> <property> <name>mapred.capacity-scheduler.queue.etl.init-accept-jobs-factor</name> <value>100</value> </property> <!-- day --> <property> <name>mapred.capacity-scheduler.queue.day.capacity</name> <value>25</value> </property> <property> <name>mapred.capacity-scheduler.queue.day.maximum-capacity</name> <value>80</value> </property> <property> <name>mapred.capacity-scheduler.queue.day.supports-priority</name> <value>false</value> </property> <property> <name>mapred.capacity-scheduler.queue.day.minimum-user-limit-percent</name> <value>20</value> </property> <property> <name>mapred.capacity-scheduler.queue.day.user-limit-factor</name> <value>10</value> </property> <property> <name>mapred.capacity-scheduler.queue.day.maximum-initialized-active-tasks</name> <value>200000</value> </property> <property> <name>mapred.capacity-scheduler.queue.day.maximum-initialized-active-tasks-per-user</name> <value>100000</value> </property> <property> <name>mapred.capacity-scheduler.queue.day.init-accept-jobs-factor</name> <value>100</value> </property> <!-- general --> <property> <name>mapred.capacity-scheduler.queue.general.capacity</name> <value>25</value> </property> <property> <name>mapred.capacity-scheduler.queue.general.maximum-capacity</name> <value>80</value> </property> <property> <name>mapred.capacity-scheduler.queue.general.supports-priority</name> <value>false</value> </property> <property> <name>mapred.capacity-scheduler.queue.general.minimum-user-limit-percent</name> <value>20</value> </property> <property> <name>mapred.capacity-scheduler.queue.general.user-limit-factor</name> <value>10</value> </property> <property> <name>mapred.capacity-scheduler.queue.general.maximum-initialized-active-tasks</name> <value>200000</value> </property> <property> <name>mapred.capacity-scheduler.queue.general.maximum-initialized-active-tasks-per-user</name> <value>100000</value> </property> <property> <name>mapred.capacity-scheduler.queue.general.init-accept-jobs-factor</name> <value>100</value> </property> </configuration>
3.拷贝jar包到hadoop(有jobtracker)的lib下hadoop-capacity-scheduler-0.20.203.0.jar
相关推荐
Hadoop任务调度器 基础知识 • Hadoop调度流程 • Hadoop自带调度器介绍 • 编写自己的Hadoop调度器 • 总结
Hadoop公平调度器延迟调度算法延迟间隔的选择,张博钰,方维,目前,Hadoop分布式计算框架在各大互联网企业中被广泛的应用。多用户共享集群是Hadoop应用的典型场景,其中如何在保证用户作业服务质
mapred.capacity-scheduler.queue.<queue-name>.capacity:设置调度器中各个queue的容量,这里指的是占用的集群的slots的百分比,需要注意的是,所有queue的该配置项加起来必须等于100,否则会导致JobTracker启动...
通过研究蚁群算法,针对现有Hadoop调度器的不足,提出一个基于蚁群算法的Hadoop资源感知调度器及其具体实现方案。从而使Hadoop作业调度器可以更有效地对任务进行分配,提高整体架构的作业性能。通过实验证明,利用蚁...
介绍了在FACEBOOK 的中使用HADOOP 进行TASK 调度的情况
hadoop 调度指南
为提高Hadoop平台性能,提出一种基于粒子群优化算法的Hadoop调度算法。以粒子位置代表可行的资源调度方案,以任务完成时间及资源负载均衡度作为目标函数,通过粒子群优化算法,找到最优的资源调度方案。实验结果表明,该...
Hadoop 中的调度
与 Hadoop 默认调度器维护一个作业队列不同,这 个特性让小作业在合理的时间内完成的同时又不“饿”到消耗较长时间的大作业。它也是一 个在多用户间共享集群的简单方法。公平共享可以和作业优先权搭配使用——优先权...
HADOOP公平调度器算法解析.doc
Hadoop常用调度算法介绍,包括FIFO、公平调度算法、计算能力调度算法、基于朴素贝叶斯先验的调度算法、基于自适应学习的调度算法。
Hadoop集群作业的调度算法Hadoop集群作业的调度算法Hadoop集群作业的调度算法
Hadoop 调度算法 调优讲解
基于粒子群优化算法的Hadoop调度算法研究.pdf
在计算能力调度算法中没有全面考虑各资源特征的分配是否满足作业多样的服务要求,...在搭建的Hadoop平台上进行实验表明,改进后的算法能较均衡地分配系统资源减少一些作业的等待时间,并且运行全部作业的用时有所减少。
满意下一代 Hadoop 调度器跑步要运行,请运行willrogers play 项目 sbt> project willrogers> run默认端口为 9000发展查看sbt-satisfy目录下的说明部署要部署,您需要创建一个 RPM sbt> project willrogers> rpm:...
基于内存数据局部性的Hadoop调度策略优化,王佳琪,张雷,Hadoop平台被广泛应用于大规模计算领域,如数据挖掘、数据分析等。随着大数据应用多样化,很多应用对作业实时性要求越来越高。在作
在此基础上对Yarn的FairScheduler算法进行了改进,形成了考虑节点性能的调度算法。重新对Hadoop源码进行了编译,在所搭建的Hadoop平台上进行了对比实验,证明了加入节点性能指标有效解决了Hadoop负载均衡问题,对...
一种异构环境下的Hadoop调度算法,梁建武,周杨,MapReduce作为一种重要的大规模数据并行程序模型已经越来越广泛的应用于web索引,数据挖掘以及科学仿真等方面。作为MapReduce的开源实现
hadoop公平调度算法解析