1. Problem Definition of Clustering:
Informal goal: Given n "points" [Web pages, images, genome fragments, etc.] classify into "coherent groups" -- cluster
Assumptions:
(1) As input, given a (dis)similarity measure -- a distance d(p , q) between each point pair.
(2) Symmetric [i.e., d(p , q) = d(q , p)] (Examples: Euclidean distance, genome similarity, etc)
Same cluster ==> "nearby"
2. Max-Spacing k-Clusterings
k-clustering : the # of desired clusters is k
separated pair : Call points p & q separated if they're assigned to dierent clusters.
Spacing : The spacing of a k-clustering is min (separated p,q){ d(p , q) }. (The bigger the better)
Max-Spacing k-Clusterings problem : Given a distance measure d and k, compute the k-clustering with maximum spacing.
3. A Greedy Algorithm
-- Initially, each point in a separate cluster
-- Repeat until only k clusters:
-- Let p , q = closest pair of separated points (determines the current spacing)
-- Merge the clusters containing p & q into a single cluster.
Note: Just like Kruskal's MST algorithm, but stopped early.
4. Correctness of Greedy Clustering
-- Let C1, ... , Ck = greedy clustering with spacing S. Let C1', ... , Ck' = arbitrary other clustering.
Need to show : spacing of C1', ... , Ck' <= S
-- Case 1: Ci' are the same as the Ci (maybe after renaming) ==> has the same spacing S.
-- Case 2: Otherwise, can find a point pair p , q such that:
(A) p , q in the same greedy cluster Ci
(B) p , q in different clusters Ci'
-- Easy case: If p , q directly merged at some point in Ci, then S >= d(p , q) (Distance between merged point pairs only goes up) == > S >= spacing of C1', ... , Ck' ( since p, q are separated )
-- Tricky case: p , q "indirectly merged" through multiple direct merges. Let p, a1, ... al, q be the path of direct greedy merges connecting p & q. Since p in Ci' and q not in Ci' ==> exists consecutive pair aj , aj+1 with aj in Ci' and aj+1 not in Ci' ==> S >= d(aj , aj+1) >= Spacing of C1', ... , Ck'
相关推荐
使用两种最小生成树的方法进行聚类,并对效果进行比较,处理了8种典型二维图像和压缩后的三维图像
基于MST-单连接聚类算法C++实现,数据挖掘中单连接聚类算法基于最小生成树c++实现,有谁写出更好的,就上传给大家分享。
该算法通过MST维护空间数据的基本空间结构特征,通过打断MST中最不一致的边形成MST聚类,不仅具有密度的聚类方法能够聚集非球状簇和分布不均的数据集的特点,而且聚类结果不依赖于用户参数的选择,因此,离群挖掘结果更...
基于密度的最小生成树聚类算法,将最小生成树...因此,本文在基于密度的MST聚类的基础上,通过减少数据集扫描次数以提高离群检测的效率。理论分析表明,检测算法可以有效地处理分布不均的数据集,适用于大规模数据集的挖掘。
MST703 MST705GPIO控制寄存器使用说明
mst 用最小生成树聚类
MST703 MST705 驱动七寸液晶 VGA--TTL
一种自适应优化相异性度量的基于MST的半监督聚类方法,陈新泉,,针对混合属性空间中的具有同一分布特性的带类别标记的小样本集和无类别标记的大样本数据集,提出了一种自适应优化相异性度量的基
MST705 MST703芯片的Keil完整版工程,不需要Along_cfg.h文件,可以直接编译成功,已经使用Keil5和Keil3编译通过。请使用32位系统进行编译,否则无法正常转换bin文件。
schematic for the MST702S
Small size lcd tv processor datasheet
MST706 原理图 多用于倒车影像开发,希望对大家有帮助
晨星Mstar的MST703参考手册。 MST703是一款性价比非常高的视频解码器及小尺寸TFT液晶屏驱动芯片,外围电路简单。
MST705/MST703/MST702应用指南及注意事项
MST703所以引脚的配置
基于最小生成树的半监督聚类算法,霍萌萌,刘阳阳,已知的多数半监督聚类算法依赖成对约束的方法提出。这些算法通常使用先验知识提高聚类精度。本文使用另一种叫做标签传播的半监督
MST F3 希捷专修
MST703 MST705 驱动七寸液晶 CVBS--TTL。
schematic for mst720
DRAM memory interface支持DDR3 16-bit/8-bit*2,up to 1GMHz;NAND interface支持i-NAND.Movi-NAND,SLC/MLC NAND FLASH,Up to 60-bit ECC;支持1080P Full HD解码和720P H.264编码;支持LVDS/TTL/TTL+TCON Panel,分辨率...