Hadoop: How to using two mapper to do different thing

ylzhj02

浏览: 235613 次
性别:
来自: 成都

最近访客更多访客>>

daqin

bbpopeye

也许on

learnmore

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Hadoop

In my work, I run a situation that I want to use A mapper reading a file with to fields (questionId, questionTags) and outpute format likes key: questionId value: questionTags, while B mapper reading a dir which contains many files named by questionId with questionContent as its file content and output format likes key: questionId/fileName value: questionContent. Then a reducer do some string operations.

The framework above is

A mapper

> reducer

B mapper

The problem can't be solved by ChainMapper.

I found that the two mapper's output format is the same. So, the other way is to adopt one mapper to read questions dir and tags file.

two problems;

            QuestionTagsWritable e1 = null, e2 = null;

            for (QuestionTagsWritable e : values) {
                System.out.println("xx = " + e.toString());
                if (e.isTags) {
                    e1 = e;
                } else {
                    e2 = e;
                }
            }

solution: e1 = new QuestionTagsWritable(true,tmp.content); //pass value not address

FileSplit fileSplit = (FileSplit) context.getInputSplit();

java.lang.ClassCastException:org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit cannot be cast toorg.apache.hadoop.mapreduce.lib.input.FileSplit

solution:

InputSplit split = context.getInputSplit();
		    Class<? extends InputSplit> splitClass = split.getClass();

		    FileSplit fileSplit = null;
		    if (splitClass.equals(FileSplit.class)) {
		        fileSplit = (FileSplit) split;
		    } else if (splitClass.getName().equals(
		            "org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")) {
		        // begin reflection hackery...

		        try {
		            Method getInputSplitMethod = splitClass
		                    .getDeclaredMethod("getInputSplit");
		            getInputSplitMethod.setAccessible(true);
		            fileSplit = (FileSplit) getInputSplitMethod.invoke(split);
		        } catch (Exception e) {
		            // wrap and re-throw error
		            throw new IOException(e);
		        }

		        // end reflection hackery
		    }

see:http://stackoverflow.com/questions/11130145/hadoop-multipleinputs-fails-with-classcastexception

分享到：

Solr:eclipse开发环境 | Hadoop:Integrating Hadoop Data with Orac ...

2014-10-10 10:30
浏览 685
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论