As the name says, the fuzzy k-means clustering algorithm does a fuzzy form of k-means clustering. Instead of the exclusive clustering in k-means, fuzzy k-means tries to generate overlapping clusters from the data set. In the academic community, it’s also known as the fuzzy c-means algorithm. You can think of it as an extension of k-means.
K-means tries to find the hard clusters (where each point belongs to one cluster) whereas fuzzy k-means discovers the soft clusters. In a soft cluster, any point can belong to more than one cluster with a certain affinity value towards each. This affinity is proportional to the distance from the point to the centroid of the cluster. Like k-means, fuzzy k-means works on those objects that can be represented in n-dimensional vector space and it has a distance measure defined.
mahout fkmeans -i mahout/reuters-vectors/tfidf-vectors/ -c mahout/reuters-fkmeans-centroids -o mahout/reuters-fkmeans-clusters -cd 1.0 -k 21 -m 2 -ow -x 10 -dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure
Fuzzy k-means has a parameter, m, called the fuzziness factor. Like k-means, fuzzy k-means loops over the data set but instead of assigning vectors to the nearest centroids,it calculates the degree of association of the point to each of the clusters.
Suppose for a vector, V, that d1, d2, ... dk are the distances to each of the k cluster centroids. The degree of association (u1) of vector (V) to the first cluster (C1) is calculated as
If m increases, the fuzziness of the algorithm increases, and you’ll begin to see more and
more overlap.The fuzzy k-means algorithm also converges better and faster than the standard k-
means algorithm.
相关推荐
mahout0.11版本,源码,可修改源码并自己编译,使用java语言编写,maven编译
教你成功运行mahout的taste webapp例子,网上的很多资料说的不清楚,或者版本冲突。正确的版本是jdk1.6 maven3.0.5 mahout0.5 。 摸索良久,亲测有效!
Mahout支持K-Means等聚类算法,在此zip包中已经有打好jar包的资源,不需要用户再打jar包,可以直接使用。
官方下载的mahout-distribution-0.9.tar.gz 因为下载速度实在太慢,所以分享出来,方便大家下载使用。mahout-distribution-0.9.tar.gz
Apache Mahout 项目旨在帮助开发人员更加方便快捷地创建智能应用程序。Mahout 的创始者 Grant Ingersoll 介绍了机器学习的基本概念,并演示了如何使用 Mahout 来实现文档集群、提出建议和组织内容。
驯象师 mahout-推荐-测试 这是对 Mahout 推荐人的测试。 包含测试相似性和评估。 文档: : API: :
mahout-core-0.9.jar+mahout-core-0.8.jar+mahout-core-0.1.jar
mahout是用来做大数据推荐系统和机器学习使用的框架,这个工具包官网下载非常慢,下载了一夜终于下载到了,刚好够上传的
重新编译mahout-examples-0.9-job.jar,增加分类指标:最小最大精度、召回率。详情见http://blog.csdn.net/u012948976/article/details/50203249
mahout-distribution-0.9-src.zip
maven_mahout_template-mahout-0.8
mahout中需要用到的一个版本jar包:mahout-core-0.3.jar
mahout-distribution-0.5-src.zip mahout 源码包
mahout实战 源码 mahout实战 配套 mahout-distribution-0.5.tar.gz 版本
mahout-integration-0.7mahout-integration-0.7mahout-integration-0.7mahout-integration-0.7
mahout-examples-0.10.1-job.jar 已经包含分词程序,替换掉mahout默认的jar包
mahout-0.9-cdh5.5.0.tar.gz
mahout-distribution-0.10.0-src.tar.gz
官方mahout-distribution-0.12.2-src.tar.gz
因此以反映上市公司盈利能力、偿债能力、成长能力、资产管理质量及股东获利能力5个方面共15项财务指标作为股票投资价值的衡量指标,首次尝试使用面向大数据的并行聚类算法Mahout中的K-means聚类算法和模糊K-means...