Abstract:
In order to process the discretization of a decision table with large quantity objects, it is necessary to develop a high efficient discretization algorithm. The distribution of the importance values of candidate cuts on single attribute in a decision table was analyzed, and based on the distribution, a two-step solution procedure and a high efficient discretizaiton algorithm based on the rough set theory were proposed. Firstly, the candidate cuts are dynamically clustered in the light of their importance, so the number of the candidate cuts will decrease. Secondly, the final result cuts will be selected quickly from the clustered cuts using the heuristic method, as a result, the discretizaion of the decision table can be implemented by the final result cuts. The experiment results show that after dynamic clustering, the number of candidate cuts in most of data sets can be decreased by more than 80% to raise the efficiency of next cut selection greatly. To seven UCI data sets, Iris, Wine, Glass, Ecoli, Breast_w, Pima and Letter, in the experiments, their recognition rates are about 92.0%, 92.1%, 69.3%, 65.7%, 95.3%, 67.1% and 76.5% respectively using the proposed algorithm.