k-means算法

googya

浏览: 140297 次
性别:
来自: 汉川

最近访客更多访客>>

fh0001234

xtoo8672

zzc125

lingchenV

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

算法系列

算法 Ruby C C++C#

    k-means聚类以k个（也是要聚类的个数）随机的中心质点（centroid）开始，然后将每个待聚类的项分派到离它最近的簇。分派完之后，中心质点要进行移动：移动到该簇所有点的平均位置上。接着再次进行分配。这个过程反复的执行，直到中心质点的位置不再发生明显变化或者说分配后的点所属的簇不在变化。
    具体来说是以下四步：

(1) Start with k cluster centers (chosen randomly or according to some specific procedure).

(2) Assign each row in the data to its nearest cluster center.

(3) Re-calculate the cluster centers as the "average" of the rows in (2).

(4) Repeat, until the cluster centers no longer change or some other stopping criterion has been met.

代码如下：

require  'mathn'
class Point#点的类：坐标及维度
    attr :coords,:n
    def initialize(coords)
        @coords=coords
        @n=coords.size
    end
    def  to_s
        @coords.to_s
    end
end

class Cluster
    attr :points,:n,:centroid#簇类：点的集合，维度，中心质点
    def initialize(points)
        if points.size==0
            raise "ILLEGAL:EMPTY CLUSTER"
        end
        @points=points
        @n=points[0].n
        for p in points
            if p.n!=@n
                raise "ILLEGAL: MULTISPACE CLUSTER"
            end
        end
        @centroid=calculateCentroid()
    end

    def update(points)
        old_centroid=@centroid
        @points=points
        @centroid=calculateCentroid()
        getDistance(old_centroid,@centroid)
    end

    def calculateCentroid
        centroid_coords=[]
        for i in 0..@n-1
            centroid_coords<<0.0
            for p in @points
                centroid_coords[i] = centroid_coords[i]+p.coords[i]
            end
            centroid_coords[i] = centroid_coords[i]/@points.size
        end
        Point.new(centroid_coords)
    end
    def to_s
        @points
    end

end



def kmeans(points,k,cutoff)
    initial=[]
    clusters=[]
    k.times do
        initial<<points[rand(points.size-1)]
    end
    for p in initial
        clusters<<Cluster.new([p])
    end

    while(true)
        lists=[]
        clusters.each{lists<<[]}
        points.each do |p|
            smallest_distance = getDistance(p,clusters[0].centroid)
            index=0
            for i in 0..(clusters.size-2)
                distance=getDistance(p,clusters[i+1].centroid)
                if distance<smallest_distance
                    smallest_distance=distance
                    index=i+1
                end
            end
            lists[index]<<p
        end
        biggest_shift=0.0
        for i in 0..(clusters.size-1)
            shift=clusters[i].update(lists[i])
            biggest_shift=(biggest_shift>shift)?biggest_shift : shift
        end
        if biggest_shift<cutoff
            break;
        end
    end

    clusters
end

def getDistance(a,b)
    if a.n!=b.n
        raise "ILLEGAL: NON-COMPARABLE POINTS"
    end
    ret=0.0
    for i in 0..(a.n-1)
        ret=ret+(a.coords[i]-b.coords[i])**2
    end
    Math.sqrt(ret)
end

def makeRandomPoint(n,lower,upper)
    coords=[]
    0.upto(n-1){coords<<lower+(upper-lower)*rand}
    Point.new(coords)
end



    num_points, n, k, cutoff, lower, upper = 10, 2, 3, 0.5, -200, 200
    # Create num_points random Points in n-dimensional space
    points = []
    for i in 0..(num_points-1)
        points<<makeRandomPoint(n, lower, upper)
    end

    # Cluster the points using the K-means algorithm
    clusters = kmeans(points, k, cutoff)
    # Print the results
    puts "\nPOINTS:"
    points.each {|p|print "P:", p.to_s,"\n"}
    puts "\nCLUSTERS:"
    clusters.each {|c|puts "C:", c.to_s}

0
顶

0
踩

分享到：

R中order函数的返回值的含义 | 2个空瓶换一瓶汽水问题

2010-06-28 14:25
浏览 1840
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

k-means算法

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

k-means算法

评论

发表评论

相关推荐

【转】约瑟夫问题的数学解法

遗传算法R实现

二分查找法

大数运算

Eratosthenes筛选求质数

生命游戏 game of life

八枚银币

8皇后

老鼠走迷宫

斐波那契数列

汉诺塔

算法之贪心

最近访客更多访客>>