`
tenght
  • 浏览: 47708 次
社区版块
存档分类
最新评论

MapReduce Model

 
阅读更多

¢Programmersspecify two functions:

map (k, v) → <k’, v’>*

reduce (k’, v’) → <k’, v’>*

All values with thesame key are sent to the same reducer
¢The execution framework handles everything else…

What’s“everything else”?

MapReduce “Runtime”

¢Handlesscheduling
Assigns workers tomap and reduce tasks
¢Handles“data distribution”
Moves processes todata
¢Handlessynchronization
Gathers, sorts, andshuffles intermediate data
¢Handleserrors and faults
Detects workerfailures and restarts
¢Everythinghappens on top of a distributed FS (later)

Programmersspecify two functions:

map (k, v) → <k’, v’>*

reduce (k’, v’) → <k’, v’>*

All values with thesame key are reduced together
¢The execution framework handles everything else…
¢Not quite…usually, programmers also specify:

partition (k’, number of partitions) →partition for k’

Often a simple hashof the key, e.g., hash(k’) mod n
Divides up keyspace for parallel reduce operations

combine (k’, v’) → <k’, v’>*

Mini-reducers thatrun in memory after the map phase
Used as anoptimization to reduce network traffic




分享到:
评论

相关推荐

    IOTSim A simulator for analysing IoT applications

    A disruptive technology that is influencing ...MapReduce model in cloud computing environment. A real case study validates the efficacy of the simulator.

    基于云计算的中文文本分类方法的研究

    to build a cloud platform, and how to use MapReduce model to achieve the improved SVM classification algorithm on the cloud computing platform. The final experimental results show that the new ...

    google大数据的三驾马车

    Google File System MapReduce model Bigtable data storage platform

    Google的云计算三大核心技术

    Google的云计算三大核心技术 Google File System MapReduce model Bigtable data storage platform

    MapReduce-Programming.rar_mapReduce

    The Programming Model and Practice of MapReduce by Jerry Zhao

    Creative Combination of Legacy System and MapReduce in Cloud Migration

    With the advent of big data era, the response speed of traditional legacy ... A challenging issue is how to creatively combine parallelizable legacy code and MapReduce model of cloud computing to enab

    MapReduce源码分析完整版

    Map/Reduce是一个用于大规模数据处理的分布式计算模型,它最初是由Google工程师设计并实现的,Google已经将它完整的MapReduce论文公开发布了。其中对它的定义是,Map/Reduce是一个编程模型(programming model),是...

    Hadoop The Definitive Guide PDF

    MapReduce programming model, and the various data formats that MapReduce can work with. Chapter 8 is on advanced MapReduce topics, including sorting and joining data. Chapters 9 and 10 are for Hadoop ...

    Hadoop- The Definitive Guide, 3rd Edition.pdf

    MapReduce programming model, and the various data formats that MapReduce can work with. Chapter 8 is on advanced MapReduce topics, including sorting and joining data. Chapters 9 and 10 are for Hadoop...

    Hadoop The Definitive Guide 3rd Edition

    This third edition covers recent changes to Hadoop, including material on the new MapReduce API, as well as MapReduce 2 and its more flexible execution model (YARN). Store large datasets with the ...

    BF-MapReduce :A bloom filter Based Efficient Lightweight Search

    MapReduce is an attractive programming model for large-scale data-parallel applications. However, the original.MapReduce framework also needs some optimizations to improve its performance. In this ...

    ParquetMapreduceDemo

    ParquetMapreduceDemo 演示如何在 mapreduce 中使用 parquet 作为输入/输出格式,并将 Avro 作为数据模型。 Parquet 是一种列式存储格式,具有非常高效的数据编码技术。 Avro 是一个紧凑的序列化系统。 #Object ...

    CouchDB.The.Definitive.Guide

    This book introduces you to Apache CouchDB, a document-oriented database that offers a different way to model your data. CouchDB is a schema-free database, designed to work with applications that ...

    hadoop 权威指南(第三版)英文版

    Coherency Model Parallel Copying with distcp Keeping an HDFS Cluster Balanced Hadoop Archives Using Hadoop Archives Limitations 4. Hadoop I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . ....

    Hadoop权威指南第三版修订版

    This third edition covers recent changes to Hadoop, including material on the new MapReduce API, as well as MapReduce 2 and its more flexible execution model (YARN). Store large datasets with the ...

    hadoop_the_definitive_guide_3nd_edition

    Classic MapReduce (MapReduce 1) 188 YARN (MapReduce 2) 194 Failures 200 Failures in Classic MapReduce 200 Failures in YARN 202 Job Scheduling 204 Table of Contents | v www.it-ebooks.info The Fair ...

    Mahout in Action

    Introducing MapReduce 98 ■ Translating to MapReduce: generating user vectors 99 ■ Translating to MapReduce: calculating co-occurrence 100 ■ Translating to MapReduce: rethinking matrix ...

    hadoop the definitive guide 3nd edition

    This third edition covers recent changes to Hadoop, including new material on the new MapReduce API, as well as version 2 of the MapReduce runtime (YARN) and its more flexible execution model....

Global site tag (gtag.js) - Google Analytics