`

Mahout: distributed item-based algorithm 3

 
阅读更多

Running recommendations with Hadoop

The glue that binds together the various Mapper and Reducer components is org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. It configures and invokes the series of MapReduce jobs discussed previously. These MapReduces and their relationships are illustrated below:




Run the example with wikipida data

bin/hadoop fs -put links-simple-sorted.txt input/input.txt

bin/hadoop fs -put users.txt input/users.txt
In order to run RecommenderJob , and allow Hadoop to run these jobs, you need to combine all of this code into one JAR file, along with all of the code it depends upon.This can be accomplished easily by running mvn clean package from the core/ directory in the Mahout distribution—this will produce a file like mahout-core-0.5-job.jar. Or you can use a precompiled job JAR from Mahout’s distribution.
hadoop jar mahout-core-0.5-job.jar \
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \
-Dmapred.input.dir=input/input.txt \
-Dmapred.output.dir=output --usersFile input/users.txt --booleanData

 

NOTE

The obove command will not run the example codes in <<Mahout In Action>> Ch.06 instead using default related codes in mahout. So I thought this is the big drawback in this book. To run the example codes in Ch.06, you should recreate a project based on org.apache.mahout.cf.taste.hadoop.xxxx in mahout-core project and alter some codes.

 

OR 

Using Tool class WikipediaDataConverter in the example codes to convert links-simple-sorted.txt to the default input format userid, itemid. Then using the obove commands to run it. But this way will hide all in the cover that you learned from the Ch.06. So the best way is to recreate a new project to run the example codes.

 

------------------

How to alter the codes shows as below:

org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob

 

//convert items to an internal index
    Job itemIDIndex = prepareJob(getInputPath(), getOutputPath(ITEMID_INDEX), 
                         TextInputFormat.class, ItemIDIndexMapper.class,
                         VarIntWritable.class, VarLongWritable.class, 
                         ItemIDIndexReducer.class, VarIntWritable.class, 
                         VarLongWritable.class, SequenceFileOutputFormat.class);
    itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);

 =====>

//convert items to an internal index
    Job itemIDIndex = prepareJob(getInputPath(), getOutputPath(ITEMID_INDEX), 
                         TextInputFormat.class, WikipediaItemIDIndexMapper.class,
                         VarIntWritable.class, VarLongWritable.class, 
                         ItemIDIndexReducer.class, VarIntWritable.class, 
                         VarLongWritable.class, SequenceFileOutputFormat.class);
    itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);

 --

 

 //convert user preferences into a vector per user
    Job toUserVectors = prepareJob(getInputPath(),
                                   getOutputPath(USER_VECTORS),
                                   TextInputFormat.class,
                                   ToItemPrefsMapper.class,
                                   VarLongWritable.class,
                                   booleanData ? VarLongWritable.class : EntityPrefWritable.class,
                                   ToUserVectorsReducer.class,
                                   VarLongWritable.class,
                                   VectorWritable.class,
                                   SequenceFileOutputFormat.class);
 =====>

 

 //convert user preferences into a vector per user
    Job toUserVectors = prepareJob(getInputPath(),
                                   getOutputPath(USER_VECTORS),
                                   TextInputFormat.class,
                                   WikipediaToItemPrefsMapper.class,
                                   VarLongWritable.class,
                                   booleanData ? VarLongWritable.class : EntityPrefWritable.class,
                                   WikipediaToUserVectorReducer.class,
                                   VarLongWritable.class,
                                   VectorWritable.class,
                                   SequenceFileOutputFormat.class);

 

 

 

 

Run samples on hadoop

Env: mahout 0.9   hadoop2.3.0

mvn  clean package -Dhadoop2.version=2.3.0 -DskipTests

mvn  clean package -Dhadoop.version=2.3.0 -DskipTests

 

 

 

 

 

 References

https://www.ibm.com/developerworks/library/j-mahout/

  • 大小: 70.2 KB
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics