Running recommendations with Hadoop
The glue that binds together the various Mapper and Reducer components is org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. It configures and invokes the series of MapReduce jobs discussed previously. These MapReduces and their relationships are illustrated below:
Run the example with wikipida data
bin/hadoop fs -put links-simple-sorted.txt input/input.txt
bin/hadoop fs -put users.txt input/users.txt
In order to run RecommenderJob , and allow Hadoop to run these jobs, you need to combine all of this code into one JAR file, along with all of the code it depends upon.This can be accomplished easily by running mvn clean package from the core/ directory in the Mahout distribution—this will produce a file like mahout-core-0.5-job.jar. Or you can use a precompiled job JAR from Mahout’s distribution.
hadoop jar mahout-core-0.5-job.jar \
org.apache.mahout.cf.taste.hadoop.item.RecommenderJob \
-Dmapred.input.dir=input/input.txt \
-Dmapred.output.dir=output --usersFile input/users.txt --booleanData
NOTE
The obove command will not run the example codes in <<Mahout In Action>> Ch.06 instead using default related codes in mahout. So I thought this is the big drawback in this book. To run the example codes in Ch.06, you should recreate a project based on org.apache.mahout.cf.taste.hadoop.xxxx in mahout-core project and alter some codes.
OR
Using Tool class WikipediaDataConverter in the example codes to convert links-simple-sorted.txt to the default input format userid, itemid. Then using the obove commands to run it. But this way will hide all in the cover that you learned from the Ch.06. So the best way is to recreate a new project to run the example codes.
------------------
How to alter the codes shows as below:
org.apache.mahout.cf.taste.hadoop.preparation.PreparePreferenceMatrixJob
//convert items to an internal index
Job itemIDIndex = prepareJob(getInputPath(), getOutputPath(ITEMID_INDEX),
TextInputFormat.class, ItemIDIndexMapper.class,
VarIntWritable.class, VarLongWritable.class,
ItemIDIndexReducer.class, VarIntWritable.class,
VarLongWritable.class, SequenceFileOutputFormat.class);
itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);
=====>
//convert items to an internal index
Job itemIDIndex = prepareJob(getInputPath(), getOutputPath(ITEMID_INDEX),
TextInputFormat.class, WikipediaItemIDIndexMapper.class,
VarIntWritable.class, VarLongWritable.class,
ItemIDIndexReducer.class, VarIntWritable.class,
VarLongWritable.class, SequenceFileOutputFormat.class);
itemIDIndex.setCombinerClass(ItemIDIndexReducer.class);
--
//convert user preferences into a vector per user
Job toUserVectors = prepareJob(getInputPath(),
getOutputPath(USER_VECTORS),
TextInputFormat.class,
ToItemPrefsMapper.class,
VarLongWritable.class,
booleanData ? VarLongWritable.class : EntityPrefWritable.class,
ToUserVectorsReducer.class,
VarLongWritable.class,
VectorWritable.class,
SequenceFileOutputFormat.class);
=====>
//convert user preferences into a vector per user
Job toUserVectors = prepareJob(getInputPath(),
getOutputPath(USER_VECTORS),
TextInputFormat.class,
WikipediaToItemPrefsMapper.class,
VarLongWritable.class,
booleanData ? VarLongWritable.class : EntityPrefWritable.class,
WikipediaToUserVectorReducer.class,
VarLongWritable.class,
VectorWritable.class,
SequenceFileOutputFormat.class);
Run samples on hadoop
Env: mahout 0.9 hadoop2.3.0
mvn clean package -Dhadoop2.version=2.3.0
-DskipTests
mvn clean package -Dhadoop.version=2.3.0
-DskipTests
References
https://www.ibm.com/developerworks/library/j-mahout/
相关推荐
mahout0.11版本,源码,可修改源码并自己编译,使用java语言编写,maven编译
教你成功运行mahout的taste webapp例子,网上的很多资料说的不清楚,或者版本冲突。正确的版本是jdk1.6 maven3.0.5 mahout0.5 。 摸索良久,亲测有效!
NULL 博文链接:https://snv.iteye.com/blog/2007843
官方下载的mahout-distribution-0.9.tar.gz 因为下载速度实在太慢,所以分享出来,方便大家下载使用。mahout-distribution-0.9.tar.gz
Apache Mahout 项目旨在帮助开发人员更加方便快捷地创建智能应用程序。Mahout 的创始者 Grant Ingersoll 介绍了机器学习的基本概念,并演示了如何使用 Mahout 来实现文档集群、提出建议和组织内容。
mahout-core-0.9.jar+mahout-core-0.8.jar+mahout-core-0.1.jar
mahout是用来做大数据推荐系统和机器学习使用的框架,这个工具包官网下载非常慢,下载了一夜终于下载到了,刚好够上传的
驯象师 mahout-推荐-测试 这是对 Mahout 推荐人的测试。 包含测试相似性和评估。 文档: : API: :
重新编译mahout-examples-0.9-job.jar,增加分类指标:最小最大精度、召回率。详情见http://blog.csdn.net/u012948976/article/details/50203249
mahout-distribution-0.9-src.zip
maven_mahout_template-mahout-0.8
Distributed algorithm design This book is about designing mathematical and Machine Learning algorithms using the Apache Mahout "Samsara" platform. The material takes on best programming practices ...
mahout中需要用到的一个版本jar包:mahout-core-0.3.jar
mahout-distribution-0.5-src.zip mahout 源码包
mahout-integration-0.7mahout-integration-0.7mahout-integration-0.7mahout-integration-0.7
mahout实战 源码 mahout实战 配套 mahout-distribution-0.5.tar.gz 版本
mahout-0.9-cdh5.5.0.tar.gz
mahout-examples-0.10.1-job.jar 已经包含分词程序,替换掉mahout默认的jar包
mahout-distribution-0.10.0-src.tar.gz
官方mahout-distribution-0.12.2-src.tar.gz