Mahout: Fuzzy k-means clustering - 术业有专攻 - ITeye博客

`

ylzhj02

浏览: 242801 次
性别:
来自: 成都

最近访客更多访客>>

daqin

bbpopeye

也许on

learnmore

博主相关

博客

微博

相册

收藏

留言

关于我

文章分类

社区版块

存档分类

最新评论

oldrat： https://github.com/oldratlee/tr ...
Kafka: High Qulity Posts

Mahout: Fuzzy k-means clustering

博客分类：

Mahout

阅读更多

As the name says, the fuzzy k-means clustering algorithm does a fuzzy form of k-means clustering. Instead of the exclusive clustering in k-means, fuzzy k-means tries to generate overlapping clusters from the data set. In the academic community, it’s also known as the fuzzy c-means algorithm. You can think of it as an extension of k-means.

K-means tries to find the hard clusters (where each point belongs to one cluster) whereas fuzzy k-means discovers the soft clusters. In a soft cluster, any point can belong to more than one cluster with a certain affinity value towards each. This affinity is proportional to the distance from the point to the centroid of the cluster. Like k-means, fuzzy k-means works on those objects that can be represented in n-dimensional vector space and it has a distance measure defined.

mahout fkmeans 
-i mahout/reuters-vectors/tfidf-vectors/ 
-c mahout/reuters-fkmeans-centroids 
-o mahout/reuters-fkmeans-clusters 
-cd 1.0 -k 21 -m 2 -ow -x 10 
-dm org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure

Fuzzy k-means has a parameter, m, called the fuzziness factor. Like k-means, fuzzy k-means loops over the data set but instead of assigning vectors to the nearest centroids,it calculates the degree of association of the point to each of the clusters.

Suppose for a vector, V, that d1, d2, ... dk are the distances to each of the k cluster centroids. The degree of association (u1) of vector (V) to the first cluster (C1) is calculated as

If m increases, the fuzziness of the algorithm increases, and you’ll begin to see more and
more overlap.The fuzzy k-means algorithm also converges better and faster than the standard k-
means algorithm.

查看图片附件

分享到：

Hadoop: Configuration 1 | Mahout: An overview of clustering techni ...

2014-06-12 11:18
浏览 1138
评论(0)
分类:开源软件
查看更多

评论

发表评论

您还没有登录,请您登录后再发表评论

相关推荐

mahout-distribution-0.9.tar.gz: 3. **聚类**：包括K-Means、Fuzzy K-Means、Canopy Clustering、DBSCAN等算法，可用于将相似的数据点分组到一起，常用于市场细分、用户分群等场景。 4. **频繁项集挖掘**：通过Apriori、FP-Growth等算法发现数据...

mahout-distribution-0.8-src: 2. **聚类（Clustering）**：包括K-means、Fuzzy K-means、Canopy Clustering等算法，用于将数据集中的对象分组到相似的类别中。这些算法广泛应用于市场细分、文本分类和图像分析等领域。 3. **分类...

mahout-distribution-0.7-src.zip: 2. 聚类（Clustering）：包括K-Means、Fuzzy K-Means、Canopy Clustering等算法，用于将数据点自动分组到相似的集合中。 3. 分类（Classification）：支持基于概率的朴素贝叶斯分类器（NaiveBayesTrainer）和其他...

mahout-distribution-0.9含jar包: 3. **聚类**：包括K-means、Fuzzy K-means、Canopy Clustering等，用于将数据集分成多个具有相似特征的组。这些算法广泛应用于市场细分、用户分群和数据降维。 4. **矩阵分解**：如SVD（奇异值分解）和PMI（潜在...

mahout数据挖掘: - **Fuzzy K-Means**：K-Means 的扩展，允许一个样本属于多个聚类。 - **EM 聚类**：期望最大化算法，适用于数据存在缺失的情况。 - **Mean Shift 聚类**：无需事先指定聚类数量的算法。 - **Hierarchical ...

Mahout-0.9-jar包: 2. **聚类**：包括K-Means、Fuzzy K-Means和Canopy Clustering等算法，可以对数据集进行无监督学习，将相似的数据点分组到一起，形成不同的簇。 3. **分类**：支持如Naive Bayes和Random Forest等监督学习算法，...

mahout-distribution-0.12.2-src.tar.gz: 1. **机器学习算法**：Mahout的核心在于它提供了多种机器学习算法，包括分类（如决策树、随机森林）、聚类（如K-Means、Fuzzy K-Means）、协同过滤（用于推荐系统）等。这些算法可以处理大规模数据，并且利用Hadoop...

apache-mahout-distribution-0.10.2: 2. **聚类（Clustering）**：如K-Means、Fuzzy K-Means和Canopy Clustering，用于将数据点自动分组到相似的集合中，无监督学习的一种常见应用。 3. **协同过滤（Collaborative Filtering）**：这是推荐系统的基础，...

mahout-distribution-0.5.tar.gz + 源码: 3. **聚类**：K-Means、Fuzzy K-Means和Canopy Clustering等方法用于将数据分组到相似的集合中，无监督学习的一种常见应用。 4. **特征选择与降维**：通过PCA（主成分分析）和其他方法减少数据的维度，以便更有效地...

聚类分析ppt: #### 未知参数：K值的选择在进行K-means聚类之前，需要预先确定聚类的数量K。通常可以通过尝试不同的K值并结合业务理解或者使用肘部法则(elbow method)等方法来确定最优的K值。 #### K-means算法流程 1. **初始化*...

mahout 0.7: - **聚类（Clustering）**：如K-Means、DBSCAN、Fuzzy K-Means等，用于发现数据中的自然群体。 - **协同过滤（Collaborative Filtering）**：在推荐系统中广泛使用，通过分析用户行为来预测他们可能感兴趣的新项目...

mahout-distribution-0.5-src.tar.gz ): 此外，Mahout还支持其他聚类算法，如Fuzzy K-Means，适用于数据不精确或存在噪声的情况。 2. **分类算法**：Mahout的分类算法主要包括随机森林（Random Forest）和朴素贝叶斯（Naive Bayes）。随机森林是一种集成...

mahout-0.3.zip: 1. **聚类**：Mahout提供了多种聚类算法，如K-means，Fuzzy K-means，和Canopy Clustering等。这些算法用于将数据集中的对象分成不同的组或簇，使得同一簇内的对象相似度较高，而不同簇之间的对象相似度较低。这对于...

mahout-0.3.tar: 3. **聚类算法**：如K-Means、Fuzzy K-Means、Canopy Clustering，用于将相似的数据点分组到不同的簇中。这在市场细分、社交网络分析和图像处理中有重要作用。 4. **矩阵分解**：如SVD（奇异值分解）和ALS（交替...

大数据系列9：Mahout – 机器学习: 例如，clustering_material.txt文件可能包含有关Mahout中聚类算法的详细资料，如K-Means、Fuzzy K-Means、Canopy Clustering等。 K-Means是一种迭代的聚类算法，通过不断调整每个数据点所属的类别中心来优化结果。...

mahout-0.9 jar包: 3. **聚类**：包括K-means、Fuzzy K-means、Canopy Clustering等方法，用于将数据集分成不同的群组或类别，常用于市场分割、网络流量分析等领域。 4. **频繁模式挖掘**：通过Apriori、FP-Growth等算法找出数据集中...

05、聚类算法模型.zip: 2. Fuzzy C-Means：与K-Means类似，但允许数据点同时属于多个类别，适用于边界模糊的数据集。 3. DBSCAN：基于密度的聚类算法，能发现任意形状的簇，并能自动识别噪声点。三、推荐系统算法推荐系统是利用用户的...

mahout 0.5: Mahout 0.5中的聚类算法包括K-Means、Fuzzy K-Means、Canopy Clustering等。这些算法用于发现数据集中的自然群体或模式，如在用户行为分析、市场细分等领域。 **5. 数学与统计工具：** Mahout提供了基础的数学和...

Global site tag (gtag.js) - Google Analytics