`
凤凰山
  • 浏览: 144503 次
  • 性别: Icon_minigender_1
  • 来自: 重庆
社区版块
存档分类
最新评论

MapReduce Patterns, Algorithms, and Use Cases

 
阅读更多

With the explosion of Hadoop and big data usage, many people are currently looking for approaches to convert their existing implementations into MapReduce. Unfortunately, with the notable exception of "Data-Intensive Text Processing with MapReduce" and "Mahout in Action" there are very few publications dedicated to the designing of MapReduce implementations. In his new article, "MapReduce Patterns, Algorithms, and Use Cases" Ilya Katsov provides a systematic overview of problems that can be solved using a MapReduce framework.

It starts with a fairly straightforward usage of MapReduce as a general purpose parallel execution framework, which can be applicable to many implementations requiring leveraging of large clusters for compute and data intensive calculations, including physical and engineering simulations, numerical analysis, performance testing, etc. The next group of algorithms, commonly used in Log Analysis, ETL and Data Querying, includes counting and summing, data collating (based on specific functions), filtering, parsing, validation and sorting.

The second large group of MapReduce patterns, discussed by Katsov includes multiple relational MapReduce patterns, often used by data warehousing applications. These patterns are widely leveraged by Hive and Pig implementations and include predicate/function based data selection, data projection, data union, difference and intersection and groupBy aggregations. A separate discussion is dedicated to implementing data joins and include such algorithms as repartition joins and replicated joins

Moving further up the chain of complexity, the article discusses more complex MapReduce processing algorithms, including graph processing, search algorithms (breadth first search), page rank and data aggregation algorithms that can be leveraged in graph analysis, web indexing and general search applications. It also covers common text analysis and market analysis use cases requiring cross correlation calculation. This part covers both "pairs" and "stripes" design patterns and their comparative merits.

Finally, Katsov provides a good bibliography of more complex MapReduce implementations in the field of machine learning.

Most of the algorithms, described in the article are accompanied by pseudo code and basic information for their applicability, advantages and disadvantages and some real world use cases.

Many people today are still struggling with applicability of Hadoop and MapReduce for solving their business problems. Some still consider it a "technical approach in search of a business problem". The article is an important step in filling an existing void in the field of MapReduce algorithms, use cases and design patterns. It shows MapReduce’s power far beyond infamous "word count" and the ways it can be leveraged for solving a wide range of practical problems.

 

Posted by Boris Lublinsky

http://www.infoq.com/news/2012/02/MapReducePatterns

 

 

分享到:
评论

相关推荐

    MapReduce-algorithms

    MapReduce-algorithms

    MapReduce Design Patterns

    MapReduce Design Patterns,非常不错的一本书。

    MapReduce Design Pattern

    Until now, design patterns for the MapReduce framework ...stage problems, or to perform several analytics in the same job Input and output patterns: customize the way you use Hadoop to load or store data

    Data-Intensive Text Processing with MapReduce Jimmy Lin and Chris Dyer

    Contents 1 Introduction 1.1 Computing in the Clouds 1.2 Big Ideas 1.3 Why Is This Dierent? 1.4 What This Book Is Not 2 MapReduce Basics 2.1 Functional Programming Roots ...7.3 MapReduce and Beyond

    Programming Interview Problems and Algorithms in Ruby

    Tree Serialization, Finding the Top k Elements of Data Streams, MapReduce, Partial Sorting, the Skyline Problem, DFS, BFS and Topological Sorting of Dags, the Alternative Alphabet and the Phone Words...

    Big.Data.Algorithms.Analytics.and.Applications.pdf

    Through advanced algorithms and analytics techniques, organizations can harness this data, discover hidden patterns, and use the newly acquired knowledge to achieve competitive advantages. ...

    Data Algorithms(O'Reilly,2016)

    f you are ready to dive into the MapReduce framework for processing large datasets, this practical book takes you step by step through the algorithms and tools you need to build distributed MapReduce ...

    mapreduce algorithms

    mapreduce 的一些算法 很有用哦 Sorting Searching Indexing Classification Joining TF-IDF

    Big Data, MapReduce, Hadoop, and Spark with Python

    Big Data, MapReduce, Hadoop, and Spark with Python: Master Big Data Analytics and Data Wrangling with MapReduce Fundamentals using Hadoop, Spark, and Python by LazyProgrammer English | 15 Aug 2016 | ...

    Hadoop MapReduce v2 Cookbook(PACKT,2ed,2015)

    with this book, you will soon learn about many exciting topics such as MapReduce patterns, using Hadoop to solve analytics, classifications, online marketing, recommendations, and data indexing and ...

    [MapReduce] MapReduce 设计模式 (英文版)

    [奥莱理] MapReduce Design Patterns Building Effective Algorithms and Analytics for Hadoop and Other Systems (E-Book) ☆ 出版信息:☆ [作者信息] Donald Miner, Adam Shook [出版机构] 奥莱理 [出版日期...

    mapreduce mapreduce mapreduce

    mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce mapreduce ...

    Algorithms.for.Data.Science

    This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive ...

    Storage and Retrieval of L ar ge RDF G raph Usin g Had o op and MapReduce

    use of Hadoop’s MapReduce framework to actually answer the queries. Our results reveal that we can store huge amount of semantic web data in Hadoop clusters built mostly by cheap commodity class ...

    Hadoop MapReduce Cookbook

    Starting with installing Hadoop YARN, MapReduce, HDFS, and other Hadoop ecosystem components, with this book, you will soon learn about many exciting topics such as MapReduce patterns, using Hadoop to...

    MapReduce_and_filter

    MapReduce_and_filter

    Learning Apache Mahout(PACKT,2015)

    You will learn about Mahout building blocks, addressing feature extraction, reduction and the curse of dimensionality, delving into classification use cases with the random forest and Naïve Bayes ...

    Hadoop.MapReduce.v2.Cookbook.2nd.Edition.1783285478

    with this book, you will soon learn about many exciting topics such as MapReduce patterns, using Hadoop to solve analytics, classifications, online marketing, recommendations, and data indexing and ...

Global site tag (gtag.js) - Google Analytics