In my work, I run a situation that I want to use A mapper reading a file with to fields (questionId, questionTags) and outpute format likes key: questionId value: questionTags, while B mapper reading a dir which contains many files named by questionId with questionContent as its file content and output format likes key: questionId/fileName value: questionContent. Then a reducer do some string operations.
The framework above is
A mapper
> reducer
B mapper
The problem can't be solved by ChainMapper.
I found that the two mapper's output format is the same. So, the other way is to adopt one mapper to read questions dir and tags file.
two problems;
a.
QuestionTagsWritable e1 = null, e2 = null;
for (QuestionTagsWritable e : values) {
System.out.println("xx = " + e.toString());
if (e.isTags) {
e1 = e;
} else {
e2 = e;
}
}
solution: e1 = new QuestionTagsWritable(true,tmp.content); //pass value not address
b.
FileSplit fileSplit = (FileSplit) context.getInputSplit();
solution:
InputSplit split = context.getInputSplit(); Class<? extends InputSplit> splitClass = split.getClass(); FileSplit fileSplit = null; if (splitClass.equals(FileSplit.class)) { fileSplit = (FileSplit) split; } else if (splitClass.getName().equals( "org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")) { // begin reflection hackery... try { Method getInputSplitMethod = splitClass .getDeclaredMethod("getInputSplit"); getInputSplitMethod.setAccessible(true); fileSplit = (FileSplit) getInputSplitMethod.invoke(split); } catch (Exception e) { // wrap and re-throw error throw new IOException(e); } // end reflection hackery }
see:http://stackoverflow.com/questions/11130145/hadoop-multipleinputs-fails-with-classcastexception
相关推荐
hadoop权威指南代码 (Hadoop: The Definitive Guide code) http://www.hadoopbook.com
"Data Analytics with Hadoop: An Introduction for Data Scientists" ISBN: 1491913703 | 2016 | PDF | 288 pages | 7 MB Ready to use statistical and machine-learning techniques across large data sets? ...
Hadoop: The Definitive Guide is a comprehensive resource for using Hadoop to build reliable, scalable, distributed systems. Programmers will find details for analyzing large datasets with Hadoop, and...
Hadoop: The Definitive Guide, 4th Edition Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you’ll learn how to build and maintain reliable, scalable,...
With this digital Early Release edition of Hadoop: The Definitive Guide, you get the entire book bundle in its earliest form – the author’s raw and unedited content – so you can take advantage of ...
[Hadoop权威指南(第2版)].(Hadoop:The.Definitive.Guide).Tom.White.文字版.pdf
pdf+epub This book will teach you how to deploy large-scale datasets in deep neural networks with Hadoop for optimal...this book will then show you how to set up the Hadoop environment for deep learning.
实战Hadoop:开启通向云计算的捷径
Hadoop 权威指南第四版,Hadoop: The Definitive Guide 4th 英文原版,内有 pdf 注释勾划,不喜欢的朋友使用 Adobe reader 删除即可
Hadoop的权威指南 Hadoop: The Definitive Guide 。里面有两个pdf的压缩包,中英两版本都有,欢迎查阅
资源名称:云计算Hadoop:快速部署Hadoop集群内容简介: 近来云计算越来越热门了,云计算已经被看作IT业的新趋势。云计算可以粗略地定义为使用自己环境之外的某一服务提供的可伸缩计算资源,并按使用量付费。可以...
实战Hadoop:开启通向云计算的捷径(刘鹏) 高清完整中文版PDF下载
SQL-on-Hadoop的一篇综述文章
Chapter 4 covers the fundamentals of I/O in Hadoop: data integrity, compression, serialization, and file-based data structures. The next four chapters cover MapReduce in depth. Chapter 5 goes through ...
Along with Hadoop 2.x and illustrates how it can be used to extend the capabilities of Hadoop. When you nish this course, you will be able to tackle the real-world scenarios and become a big data ...
计算Hadoop:快速部署Hadoop集群 详细的Hadoop集群部署文档,对您绝对有用~
分布式存储系统hadoop:hbase安装经验,非常不错的hadoop之hbase,入门环境搭建。
Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on ...
hadoop2.7汇总:新增功能最新编译64位安装、源码包、API、eclipse插件下载
Hadoop: The Definitive Guide, Third Edition by Tom White 2012-01-27 Early release revision 1 Hadoop got its start in Nutch. A few of us were attempting to build an open source web search engine and ...