`
googya
  • 浏览: 140297 次
  • 性别: Icon_minigender_1
  • 来自: 汉川
社区版块
存档分类
最新评论

k-means算法

阅读更多
    k-means聚类以k个(也是要聚类的个数)随机的中心质点(centroid)开始,然后将每个待聚类的项分派到离它最近的簇。分派完之后,中心质点要进行移动:移动到该簇所有点的平均位置上。接着再次进行分配。这个过程反复的执行,直到中心质点的位置不再发生明显变化或者说分配后的点所属的簇不在变化。
    具体来说是以下四步:
   
  • (1) Start with k cluster centers (chosen randomly or according to some specific procedure).

   
  • (2) Assign each row in the data to its nearest cluster center.

   
  • (3) Re-calculate the cluster centers as the "average" of the rows in (2).

   
  • (4) Repeat, until the cluster centers no longer change or some other stopping criterion has been met.



代码如下:
require  'mathn'
class Point#点的类:坐标及维度
    attr :coords,:n
    def initialize(coords)
        @coords=coords
        @n=coords.size
    end
    def  to_s
        @coords.to_s
    end
end

class Cluster
    attr :points,:n,:centroid#簇类:点的集合,维度,中心质点
    def initialize(points)
        if points.size==0
            raise "ILLEGAL:EMPTY CLUSTER"
        end
        @points=points
        @n=points[0].n
        for p in points
            if p.n!=@n
                raise "ILLEGAL: MULTISPACE CLUSTER"
            end
        end
        @centroid=calculateCentroid()
    end

    def update(points)
        old_centroid=@centroid
        @points=points
        @centroid=calculateCentroid()
        getDistance(old_centroid,@centroid)
    end

    def calculateCentroid
        centroid_coords=[]
        for i in 0..@n-1
            centroid_coords<<0.0
            for p in @points
                centroid_coords[i] = centroid_coords[i]+p.coords[i]
            end
            centroid_coords[i] = centroid_coords[i]/@points.size
        end
        Point.new(centroid_coords)
    end
    def to_s
        @points
    end

end



def kmeans(points,k,cutoff)
    initial=[]
    clusters=[]
    k.times do
        initial<<points[rand(points.size-1)]
    end
    for p in initial
        clusters<<Cluster.new([p])
    end

    while(true)
        lists=[]
        clusters.each{lists<<[]}
        points.each do |p|
            smallest_distance = getDistance(p,clusters[0].centroid)
            index=0
            for i in 0..(clusters.size-2)
                distance=getDistance(p,clusters[i+1].centroid)
                if distance<smallest_distance
                    smallest_distance=distance
                    index=i+1
                end
            end
            lists[index]<<p
        end
        biggest_shift=0.0
        for i in 0..(clusters.size-1)
            shift=clusters[i].update(lists[i])
            biggest_shift=(biggest_shift>shift)?biggest_shift : shift
        end
        if biggest_shift<cutoff
            break;
        end
    end

    clusters
end

def getDistance(a,b)
    if a.n!=b.n
        raise "ILLEGAL: NON-COMPARABLE POINTS"
    end
    ret=0.0
    for i in 0..(a.n-1)
        ret=ret+(a.coords[i]-b.coords[i])**2
    end
    Math.sqrt(ret)
end

def makeRandomPoint(n,lower,upper)
    coords=[]
    0.upto(n-1){coords<<lower+(upper-lower)*rand}
    Point.new(coords)
end



    num_points, n, k, cutoff, lower, upper = 10, 2, 3, 0.5, -200, 200
    # Create num_points random Points in n-dimensional space
    points = []
    for i in 0..(num_points-1)
        points<<makeRandomPoint(n, lower, upper)
    end

    # Cluster the points using the K-means algorithm
    clusters = kmeans(points, k, cutoff)
    # Print the results
    puts "\nPOINTS:"
    points.each {|p|print "P:", p.to_s,"\n"}
    puts "\nCLUSTERS:"
    clusters.each {|c|puts "C:", c.to_s}


0
0
分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics