Quick Discretization Algorithm for Rough Set Based on Dynamic Clustering
-
摘要: 为处理大数据量决策表的离散化问题,设计高效的离散化算法是必要的.根据候选断点在单属性上重 要性值的分布规律,提出了先动态聚类,再选择候选断点暠的思路和基于Rough集的快速离散化算法.首先,根 据断点的重要性在单个特征上的分布规律,对断点进行快速动态聚类,从而有效降低候选断点的数目;然后,在 聚类结果的基础上,采用启发式方法快速选择并得到最终的断点集,从而实现决策表的离散化.试验结果表明: 通过动态聚类,多数数据集候选断点的数目能减少80%以上,大大提高了后续断点选择的效率;用提出的算法 处理7个UCI数据集Iris、Wine、Glass、Ecoli、Breast_w、Pima和Letter,其正确识别率分别约为92.0%、92.1%、 69.3%、65.7%、95.3%、67.1%和76.5%.Abstract: In order to process the discretization of a decision table with large quantity objects, it is necessary to develop a high efficient discretization algorithm. The distribution of the importance values of candidate cuts on single attribute in a decision table was analyzed, and based on the distribution, a two-step solution procedure and a high efficient discretizaiton algorithm based on the rough set theory were proposed. Firstly, the candidate cuts are dynamically clustered in the light of their importance, so the number of the candidate cuts will decrease. Secondly, the final result cuts will be selected quickly from the clustered cuts using the heuristic method, as a result, the discretizaion of the decision table can be implemented by the final result cuts. The experiment results show that after dynamic clustering, the number of candidate cuts in most of data sets can be decreased by more than 80% to raise the efficiency of next cut selection greatly. To seven UCI data sets, Iris, Wine, Glass, Ecoli, Breast_w, Pima and Letter, in the experiments, their recognition rates are about 92.0%, 92.1%, 69.3%, 65.7%, 95.3%, 67.1% and 76.5% respectively using the proposed algorithm.
-
Key words:
- rough set /
- decision table /
- discretization /
- clustering
点击查看大图
计量
- 文章访问数: 1285
- HTML全文浏览量: 75
- PDF下载量: 529
- 被引次数: 0