Mahout: distributed item-based algorithm 3

ylzhj02

浏览: 237268 次
性别:
来自: 成都

最近访客更多访客>>

daqin

bbpopeye

也许on

learnmore

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Mahout

Running recommendations with Hadoop

The glue that binds together the various Mapper and Reducer components is org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. It configures and invokes the series of MapReduce jobs discussed previously. These MapReduces and their relationships are illustrated below:

Run the example with wikipida data

bin/hadoop fs -put links-simple-sorted.txt input/input.txt

bin/hadoop fs -put users.txt input/users.txt

In order to run RecommenderJob , and allow Hadoop to run these jobs, you need to combine all of this code into one JAR file, along with all of the code it depends upon.This can be accomplished easily by running mvn clean package from the core/ directory in the Mahout distribution—this will produce a file like mahout-core-0.5-job.jar. Or you can use a precompiled job JAR from Mahout’s distribution.

hadoop jar mahout-core-0.5-job.jar \
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \
-Dmapred.input.dir=input/input.txt \
-Dmapred.output.dir=output --usersFile input/users.txt --booleanData

NOTE

The obove command will not run the example codes in <<Mahout In Action>> Ch.06 instead using default related codes in mahout. So I thought this is the big drawback in this book. To run the example codes in Ch.06, you should recreate a project based on org.apache.mahout.cf.taste.hadoop.xxxx in mahout-core project and alter some codes.

Using Tool class WikipediaDataConverter in the example codes to convert links-simple-sorted.txt to the default input format userid, itemid. Then using the obove commands to run it. But this way will hide all in the cover that you learned from the Ch.06. So the best way is to recreate a new project to run the example codes.

------------------

How to alter the codes shows as below:

org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob

//convert items to an internal index
    Job itemIDIndex = prepareJob(getInputPath(), getOutputPath(ITEMID_INDEX), 
                         TextInputFormat.class, ItemIDIndexMapper.class,
                         VarIntWritable.class, VarLongWritable.class, 
                         ItemIDIndexReducer.class, VarIntWritable.class, 
                         VarLongWritable.class, SequenceFileOutputFormat.class);
    itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);

=====>

//convert items to an internal index
    Job itemIDIndex = prepareJob(getInputPath(), getOutputPath(ITEMID_INDEX), 
                         TextInputFormat.class, WikipediaItemIDIndexMapper.class,
                         VarIntWritable.class, VarLongWritable.class, 
                         ItemIDIndexReducer.class, VarIntWritable.class, 
                         VarLongWritable.class, SequenceFileOutputFormat.class);
    itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);

 //convert user preferences into a vector per user
    Job toUserVectors = prepareJob(getInputPath(),
                                   getOutputPath(USER_VECTORS),
                                   TextInputFormat.class,
                                   ToItemPrefsMapper.class,
                                   VarLongWritable.class,
                                   booleanData ? VarLongWritable.class : EntityPrefWritable.class,
                                   ToUserVectorsReducer.class,
                                   VarLongWritable.class,
                                   VectorWritable.class,
                                   SequenceFileOutputFormat.class);

=====>

 //convert user preferences into a vector per user
    Job toUserVectors = prepareJob(getInputPath(),
                                   getOutputPath(USER_VECTORS),
                                   TextInputFormat.class,
                                   WikipediaToItemPrefsMapper.class,
                                   VarLongWritable.class,
                                   booleanData ? VarLongWritable.class : EntityPrefWritable.class,
                                   WikipediaToUserVectorReducer.class,
                                   VarLongWritable.class,
                                   VectorWritable.class,
                                   SequenceFileOutputFormat.class);

Run samples on hadoop

Env: mahout 0.9 hadoop2.3.0

mvn  clean package -Dhadoop2.version=2.3.0 -DskipTests

mvn  clean package -Dhadoop.version=2.3.0 -DskipTests

References

https://www.ibm.com/developerworks/library/j-mahout/

查看图片附件

分享到：

Mahout: build 0.9 support hadoop2.3.0 | Mahout: distributed item-based algorithm ...

2014-05-04 13:55
浏览 887
评论(0)
分类:开源软件
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论