Effective Twice-Clustering Algorithm for Data Streams
-
摘要: 为提高数据分布不规则和含有噪音时的数据流聚类质量,提出了一种有效的数据流二次聚类算法TCLUSA.该算法基于分区思想,采用DBSCAN方法对每块分区进行聚类,以得到的簇的均值点作为其代表点,再用k-m eans对所获得的代表点进行聚类,算法采用分层结构保存每次聚类获得的簇参考点,直至获得最终结果.理论分析和实验结果表明,TCLUSA算法能有效提高数据流的聚类质量.Abstract: In order to enhance the quality of data stream clustering towards noisy and unbalanced data,an effective twice-clustering algorithm for data streams,TCLUSA for short,was proposed.TCLUSA is based on the simple divide-and-conquer and separability theorems,uses DBSCAN(density-based spatial clustering of applications with noise) to get the average point of each cluster as its local result,and then achieves the final result by clustering all the average points using the k-means.The algorithm keeps all the average points by a layered structure.The theoretical analysis and experimental results demonstrate that the proposed algorithm can enhance clustering quality efficiently when data distribution is abnormal or a high dimensional data stream is dealt with.
点击查看大图
计量
- 文章访问数: 1351
- HTML全文浏览量: 45
- PDF下载量: 63
- 被引次数: 0