注: 内容参考至《Mahout实战》
根据mahout实战里面的内容,接下来将使用grouplens提供的movielens-1m的数据进行推荐。
在mahout自带的example之中,已经有了能读取dat文件的代码。其扩展至FileDataModel, 因此拿过来就能直接用了。但是由于考虑到机器性能的原因,我会丢弃掉部分数据,减小运算的数据量~
改造主要就是在参数之中增加了一个removeRatio参数,在读取文件的时候根据这个随机数进行随机的丢弃掉部分数据。
下面就是我稍微改造的GroupLensDataModel.java
/** * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import java.io.File; import java.io.FileOutputStream; import java.io.IOException; import java.io.InputStream; import java.io.OutputStreamWriter; import java.io.Writer; import java.net.URL; import java.util.Random; import java.util.regex.Pattern; import org.apache.mahout.cf.taste.impl.model.file.FileDataModel; import org.apache.mahout.common.iterator.FileLineIterable; import com.google.common.base.Charsets; import com.google.common.io.Closeables; import com.google.common.io.Files; import com.google.common.io.InputSupplier; import com.google.common.io.Resources; public final class GroupLensDataModel extends FileDataModel { private static final String COLON_DELIMTER = "::"; private static final Pattern COLON_DELIMITER_PATTERN = Pattern.compile(COLON_DELIMTER); /** * * @param ratingsFile ratingsFile GroupLens ratings.dat file in its native format * @param removeRatio try to make target file size small by random drop data * @throws IOException IOException if an error occurs while reading or writing files */ public GroupLensDataModel(File ratingsFile, double removeRatio) throws IOException { super(convertGLFile(ratingsFile, removeRatio)); } /** * * @param originalFile * @param ratio will remove part of target records * @return * @throws IOException */ private static File convertGLFile(File originalFile, double ratio) throws IOException { // Now translate the file; remove commas, then convert "::" delimiter to comma File resultFile = new File(new File(System.getProperty("java.io.tmpdir")), "ratings.txt"); if (resultFile.exists()) { resultFile.delete(); } Writer writer = null; try { writer = new OutputStreamWriter(new FileOutputStream(resultFile), Charsets.UTF_8); Random rand = new Random(); for (String line : new FileLineIterable(originalFile, false)) { if(rand.nextDouble() > ratio) { int lastDelimiterStart = line.lastIndexOf(COLON_DELIMTER); if (lastDelimiterStart < 0) { throw new IOException("Unexpected input format on line: " + line); } String subLine = line.substring(0, lastDelimiterStart); String convertedLine = COLON_DELIMITER_PATTERN.matcher(subLine).replaceAll(","); writer.write(convertedLine); writer.write('\n'); } } } catch (IOException ioe) { resultFile.delete(); throw ioe; } finally { Closeables.close(writer, false); } return resultFile; } public static File readResourceToTempFile(String resourceName) throws IOException { InputSupplier<? extends InputStream> inSupplier; try { URL resourceURL = Resources.getResource(GroupLensDataModel.class, resourceName); inSupplier = Resources.newInputStreamSupplier(resourceURL); } catch (IllegalArgumentException iae) { File resourceFile = new File("src/main/java" + resourceName); inSupplier = Files.newInputStreamSupplier(resourceFile); } File tempFile = File.createTempFile("taste", null); tempFile.deleteOnExit(); Files.copy(inSupplier, tempFile); return tempFile; } @Override public String toString() { return "GroupLensDataModel"; } }
下面就是主程序:
import java.io.File; import java.io.IOException; import org.apache.mahout.cf.taste.common.TasteException; import org.apache.mahout.cf.taste.eval.RecommenderBuilder; import org.apache.mahout.cf.taste.eval.RecommenderEvaluator; import org.apache.mahout.cf.taste.impl.eval.AverageAbsoluteDifferenceRecommenderEvaluator; import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood; import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender; import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity; import org.apache.mahout.cf.taste.model.DataModel; import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood; import org.apache.mahout.cf.taste.recommender.Recommender; import org.apache.mahout.cf.taste.similarity.UserSimilarity; public class TestGroupLens { public static void main(String[] args) { // load data set try { DataModel model = new GroupLensDataModel(new File("E:\\DataSet\\ml-1m\\ratings.dat"), 0.5); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderBuilder builder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel dataModel) throws TasteException { UserSimilarity sim = new PearsonCorrelationSimilarity(dataModel); UserNeighborhood nbh = new NearestNUserNeighborhood(30, sim, dataModel); // 生成推荐引擎 Recommender rec = new GenericUserBasedRecommender(dataModel, nbh, sim); return rec; } }; double score = evaluator.evaluate(builder, null, model, 0.7, 0.3); System.out.println(score); } catch (IOException e) { e.printStackTrace(); } catch (TasteException e) { // TODO Auto-generated catch block e.printStackTrace(); } } }
运行的结果在0.85左右。
跟书上提供的结果0.89稍微有点差距
相关推荐
基于 GroupLens 电影数据在 ElasticSearch 中部署推荐引擎 需要在独立模式下运行 Mahout 和 Elasticsearch - 也可以在 Hadoop 集群上运行。 原始数据 u.item - 1,682 部电影的电影元数据(以竖线分隔) u.data - ...
推荐算法 基于用户的协同过滤 基于内容的推荐 基于热点的推荐 Mahout:整体框架,实现了协同过滤 Deeplearning4j,构建VSM Jieba:分词,关键词提取 HanLP:分词,关键词提取 Spring Boot:提供API、ORM 关键...
mahout0.11版本,源码,可修改源码并自己编译,使用java语言编写,maven编译
教你成功运行mahout的taste webapp例子,网上的很多资料说的不清楚,或者版本冲突。正确的版本是jdk1.6 maven3.0.5 mahout0.5 。 摸索良久,亲测有效!
该数据包含两列,数据之间用空格进行划分,主要用来进行聚类分析,可以直接作为mahout机器学习平台的实验数据
Mahout支持K-Means等聚类算法,在此zip包中已经有打好jar包的资源,不需要用户再打jar包,可以直接使用。
mahout关联推荐算法的介绍,例如PFPGrowth算法的参数使用介绍以及适用场景
MovieRecommender基于Mahout实现协同过滤推荐算法的电影推荐系统^
Mahout实战案例-约会推荐系统,详情参考博客《Mahout案例实战--Dating Recommender 系统》http://blog.csdn.net/fansy1990/article/details/44181459
这是这篇博文的工程代码,是MyEclipse的工程文件。...由于原工程开发的时候是在MyEclipse中引用mahout的jar包,所以这个压缩文件并没有相关的jar文件,故运行此项目之前需要做的事请,是需要导入mahout的开发包。
Chapter 1: Introduction to Mahout Chapter 2: Core Concepts in Machine Learning Chapter 3: Feature Engineering Chapter 4: Classification with Mahout Chapter 5: Frequent Pattern Mining and Topic ...
基于Mahout实现协同过滤推荐算法的电影推荐系统
基于 Apache Mahout 构建社会化推荐引擎
基于Java+Mahout的协同过滤推荐算法图书推荐系统源码+项目说明.zip 基于协同过滤的书籍推荐系统,图书推荐系统 最新版本,在原先手动计算皮尔逊相似度和评分矩阵的基础上添加了Mahout实现的协同过滤推荐算法。 ...
基于用户的简单推荐 保证可以运行 用到了mahout的相关算法
MovieRecommender是一个基于Apache Mahout库构建的电影推荐系统,采用协同过滤算法为用户提供个性化的电影推荐。该系统主要由Java语言开发,辅以JavaScript实现前端交互,共包含62个文件,涵盖了算法逻辑、前端展示...
项目使用工具: 工具myeclipse + mysql + tomcat 使用java框架: hibernate + struts + spring + bootstarp + mahout 实现功能: 后台图书管理: 图书的增删改查, 后台类目。 前台书店: 图书的分页查看, ...
mahout mahout机器智能推荐系统