Weka入门实例之KMean聚类实现

ganliang13

浏览: 249659 次
性别:
来自: 北京

最近访客更多访客>>

lzb

sosohotsummer

祥云朵朵

fui

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

weka kmeans 聚类实例入门

/*上次我介绍了分类器的使用方法，这次我来介绍一下聚类算法。聚类算法在数据挖掘里面被称之为无监督学习（unsupervised learning），这是与分类算法（supervised learning）相对的。在它们两者之间还一种叫做半监督学习（semi-supervised learning）这个我会在后面的文章中重点介绍。所谓无监督学习就是在预先不知道样本类别的情况下，由聚类算法来判别样本的类别的一种学习方法。
聚类算法的一般过程分为：
1.       读入需预测样本
2.       初始化聚类算法（并设置参数）
3.       使用聚类算法对样本进行聚类
4.       打印聚类结果*/

package com.gump.weka;

import java.io.File;
import weka.clusterers.SimpleKMeans;
import weka.core.DistanceFunction;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.converters.ArffLoader;

public class Test2 {
	public static void main(String[] args) {
		Instances ins = null;
		Instances tempIns = null;
		SimpleKMeans KM = null;
		try {
			// 1.读入样本
			// File file= new
			// File("E://application//Weka-3-7//data//contact-lenses.arff");
			File file = new File("E://application//Weka-3-7//data//rfm.arff");
			ArffLoader loader = new ArffLoader();
			loader.setFile(file);
			ins = loader.getDataSet();
			// 2.初始化聚类器
			KM = new SimpleKMeans();
			KM.setNumClusters(8);// 设置类别数量

			// 3.使用聚类算法对样本进行聚类
			KM.buildClusterer(ins);
			
			// 4.打印聚类结果
			tempIns = KM.getClusterCentroids();
			System.out.println("CentroIds: " + tempIns);
			System.out.println("-------------------/n");
			for (int i = 0; i < tempIns.size(); i++) {
				Instance temp = tempIns.get(i);
				System.out.println(temp.numAttributes());
				for (int j = 0; j < temp.numAttributes(); j++) {
					System.out.print(temp.value(j) + ",");
				}
				System.out.println("");
			}
		} catch (Exception e) {
			e.printStackTrace();
		}
	}
}

%E://application//Weka-3-7//data//rfm.arff的文件数据如下：%

@relation contact-lenses

@attribute days numeric
@attribute price numeric
@attribute times numeric

@data
0,0.0,1034373
0,0.02,1
0,0.2,1
0,0.4,1
...........

分享到：