MST application to Clustering

leonzhx

浏览: 769705 次
性别:
来自: 上海

最近访客更多访客>>

u012363178

justsimple

cdphantom

wang_xuewu

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

2014-05 ( 22)
2014-04 ( 47)
2014-03 ( 25)
更多存档...

博客分类：

Algorithms II -- Standford 学习笔记

MST Clustering Spacing K means k-clusterings

1. Problem Definition of Clustering:

Informal goal: Given n "points" [Web pages, images, genome fragments, etc.] classify into "coherent groups" -- cluster

Assumptions:

(1) As input, given a (dis)similarity measure -- a distance d(p , q) between each point pair.

(2) Symmetric [i.e., d(p , q) = d(q , p)] (Examples: Euclidean distance, genome similarity, etc)

Same cluster ==> "nearby"

2. Max-Spacing k-Clusterings

k-clustering : the # of desired clusters is k

separated pair : Call points p & q separated if they're assigned to dierent clusters.

Spacing : The spacing of a k-clustering is min (separated p,q){ d(p , q) }. (The bigger the better)

Max-Spacing k-Clusterings problem : Given a distance measure d and k, compute the k-clustering with maximum spacing.

3. A Greedy Algorithm

-- Initially, each point in a separate cluster

-- Repeat until only k clusters:

-- Let p , q = closest pair of separated points (determines the current spacing)

-- Merge the clusters containing p & q into a single cluster.

Note: Just like Kruskal's MST algorithm, but stopped early.

4. Correctness of Greedy Clustering

-- Let C1, ... , Ck = greedy clustering with spacing S. Let C1', ... , Ck' = arbitrary other clustering.

Need to show : spacing of C1', ... , Ck' <= S

-- Case 1: Ci' are the same as the Ci (maybe after renaming) ==> has the same spacing S.

-- Case 2: Otherwise, can find a point pair p , q such that:

(A) p , q in the same greedy cluster Ci

(B) p , q in different clusters Ci'

-- Easy case: If p , q directly merged at some point in Ci, then S >= d(p , q) (Distance between merged point pairs only goes up) == > S >= spacing of C1', ... , Ck' ( since p, q are separated )

-- Tricky case: p , q "indirectly merged" through multiple direct merges. Let p, a1, ... al, q be the path of direct greedy merges connecting p & q. Since p in Ci' and q not in Ci' ==> exists consecutive pair aj , aj+1 with aj in Ci' and aj+1 not in Ci' ==> S >= d(aj , aj+1) >= Spacing of C1', ... , Ck'

查看图片附件

分享到：

Huffman Codes | Union Find

2013-10-04 12:08
浏览 971
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论