Incremental Clustering Algorithm Based on Rough Reduction for Data Stream
-
摘要: 针对数据流聚类算法CluStream需预先指定微聚类数目无法准确描述数据流的变化,进而影响最终聚类结果的缺陷,提出了基于粗约简的数据流增量聚类算法RICStream(rough incremental clustering stream).该算法在保证聚类精度的前提下,对参与聚类的数据流属性进行动态调整,有效地减少了聚类时间和计算量.提出了一种可增量调整的网格结构以存储数据流,保证了聚类结果能有效反映数据流的变化情况.基于真实数据集和仿真数据集的实验结果表明,RICStream算法具有较高的效率和聚类精度.Abstract: An incremental algorithm based on rough reduction for clustering data stream,named as RICStream (rough incremental clustering stream),was proposed to overcome the shortcoming of the CluStream algorithm,i.e.,the number of clusters must be predefined when it is used and the evolution of data stream can not be reflected efficiently to influence clustering results.With the guarantee of clustering accuracy,the RICStream adjusts the attributes of data stream incrementally to lead to the reduction of clustering time and computation cost.In order to store data stream a novel grid structure which can be adjusted incrementally was put forward.As a result,the changes of data stream can be efficiently reflected by clustering results.The experiments on real datasets and synthetic datasets show the applicability and validity of the RICStream.
-
Key words:
- data mining /
- clustering /
- reduction /
- data stream
点击查看大图
计量
- 文章访问数: 1513
- HTML全文浏览量: 76
- PDF下载量: 316
- 被引次数: 0