`
zhaomengsen
  • 浏览: 198512 次
  • 性别: Icon_minigender_1
  • 来自: 河北
社区版块
存档分类
最新评论

Speculative Execution in Hadoop

    博客分类:
  • hive
 
阅读更多
所谓的推测执行,就是当所有task都开始运行之后,Job Tracker会统计所有任务的平均进度,如果某个task所在的task node机器配置比较低或者CPU load很高(原因很多),导致任务执行比总体任务的平均执行要慢,此时Job Tracker会启动一个新的任务(duplicate task),原有任务和新任务(一个task会有多个attempt同时执行)哪个先执行完就把另外一个kill掉,这也是我们经常在Job Tracker页面看到任务执行成功,但是总有些任务被kill,就是这个原因。另外,根据mapreduce job的特点,同一个task执行多次的结果是一样的,所以task只要有一次执行成功,job就是成功的,被kill的task对job的结果没有影响。


配置参数:

mapred.map.tasks.speculative.execution=true

mapred.reduce.tasks.speculative.execution=true

这两个是推测执行的配置项,当然如果你从来不关心这两个选项也没关系,它们默认值是true

而Hadoop 会根据task progress score决定是否killed一个task:


Hadoop monitors task progress using a progress score between 0 and 1.

For a map, the progress score is the fraction of input data read.

For a reduce task, the execution is divided into three phases, each of which accounts for 1/3 of the score:
• The copy phase, when the task fetches map outputs.
• The sort phase, when map outputs are sorted by key.
• The reduce phase, when a user-defined function is applied to the list of map outputs with each key.
In each phase, the score is the fraction of data processed.
For example,
• a task halfway through the copy phase has a progress score of 1 / 2 * 1 / 3 = 1 / 6
• a task halfway through the reduce phase has a progress score of 1 / 3 + 1 / 3 + 1 / 2 * 1 / 3 = 5 / 6

Hadoop looks at the average progress score of each category of tasks (maps and reduces) to define a threshold for speculative execution. When a task’s progress score is less than the average for its category by a threshold, and the task has run for a certain amount of time, it is considered slow. The scheduler also ensures that at most one speculative copy of each task is running at a time. When running multiple jobs, Hadoop uses a FIFO discipline where the earliest submitted job is asked for a task to run, then the second, etc. There is also a priority system for putting jobs into higher-priority queues.

(来源:http://adhoop.wordpress.com/2012/02/24/speculative-execution-in-hadoop/)


扩展阅读: Hadoop.The.Definitive.Guide.3rd.Edition
分享到:
评论

相关推荐

    Lazy and Speculative Execution - Microsoft Research - Slides (12th December, 2006)-计算机科学

    speculationI am not presenting such a theory12 December 2006 Lampson: Lazy and Speculative Execution 3Lazy EvaluationWell studied in programming languages Though not much used Lazy vs. eag

    hadoop 权威指南(第三版)英文版

    Speculative Execution Output Committers Task JVM Reuse Skipping Bad Records 7. MapReduce Types and Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 ...

    hadoop_the_definitive_guide_3nd_edition

    Speculative Execution 213 Output Committers 215 Task JVM Reuse 216 Skipping Bad Records 217 7. MapReduce Types and Formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....

    alpha_21264.pdf

    THE ALPHA 21264 MICROPROCESSOR Out of Order execution Speculative execution

    英特尔芯片漏洞处置

    有详细对付英特尔最新漏洞的方法,现代的计算机处理器芯片通常使用“预测执行”( SpeculativeExecution)和“分支预测”(Indirect Branch Prediction)技术实现对处理器计算资源的最大化利用。但由于这两种技术在实现...

    Exploiting Coarse-Grain Speculative Parallelism-计算机科学

    Speculative execution at coarse granularities (e.g., code- blocks, methods, algorithms) offers a promising program- ming model for exploiting parallelism on modern archi- tectures. In this paper we ...

    IJPP-2009.pdfA Speculative and Adaptive MPI Rendezvous Protocol Over RDMA-enable

    In this paper, we propose a novel speculative MPI Rendezvous protocol that uses RDMA Read and RDMA Write to effectively improve communication progress and consequently the overlap ability. ...

    Hbase中文文档

    7.4. Speculative Execution 8. HBase安全 8.1. 安全客户端访问 HBase 8.2. 访问控制 9. 架构 9.1. 概述 9.2. Catalog Tables 9.3. 客户端 9.4. Client Request Filters 9.5. Master 9.6. RegionServer 9.7. Regions...

    Speculative Lock Elision

    这是一篇关于SLE的论文。SLE是一个关于通过动态的分析删除加锁解锁操作从而提升多线程并发性能的方案。

    scsa.rar_han carlson adder_reduce_speculative adder

    Speculative variable latency adders have attracted strong interest thanks to their capability to reduce average delay compared to traditional architectures. This proposes a novel variable latency ...

    SAM - Speculative Access to Memory-开源

    Speculative Access to Memory 是一个库,允许程序员创建以推测方式访问内存区域的线程。 这意味着在系统确定所有数据相关性都已解决之前,不会验证读取的数据。

    Data-Intensive-Computing

    数据密集型计算 这个存储库目前有以下项目: Diagnosis-and-Corrective-measures-For-Heterogeneous-Yarn-Cluster - This projects describes a ... corrective measures based on Speculative execution mechanism.

    loc-speculative-annotations:国会图书馆实验室,艺术家驻留计划项目。 推测注释

    推测注释目录自定义主题路线图贡献执照接触致谢 关于该项目Speculative Annotation是美国国会图书馆创新者在 Courtney McClellan 的一个动态网站和公共艺术项目。 该工具提供了来自国会图书馆的独特迷你免费使用项目...

    温度感知的MapReduce节能任务调度策略

    另一方面,易发的宕机现象将直接导致任务的失败,推测执行(speculative execution)机制容易使运行时任务被迫中止。继而提出温度感知的节能任务调度策略,将节点 CPU 温度纳入任务调度的决策信息,以避免少数高温任务...

    Masterminding the Deal

    returns to acquiring firms’ shareholders and/or speculative purchase bids well in excess of the level of realistically achievable synergies) is today accepted as conventional financial wisdom, for ...

    A delay model and speculative architecture for pipelined routers

    NOC领域的经典基础性论文,建立了NOC ROUTER的经典四级流水线结构,为后来的论文所广泛引用。

    Programming.multicore.and.many-core.computing.systems.epub

    Programming multi-...CHAPTER 21: AN APPROACH FOR EFFICIENT EXECUTION OF SPMD APPLICATIONS ON MULTICORE CLUSTERS CHAPTER 22: OPERATING SYSTEM AND SCHEDULING FOR FUTURE MULTICORE AND MANY-CORE PLATFORMS

    erlcass:基于DataStax cpp-driver的高性能Erlang Cassandra驱动程序

    埃尔卡斯 一个基于的Erlang Cassandra驱动程序,专注于性能。... (请参见speculative_execution_policy )从2.x更新到3.0 此更新破坏了与其他版本的兼容性。 如果成功,将返回所有查询结果: 对于所有DDL和DML查询

Global site tag (gtag.js) - Google Analytics