Improved Decision Tree Algorithm Based on Samples Selection
-
摘要: 为提高决策树分类算法的精度,通过比较几种经典的决策树分类算法,提出了基于样本选取的改进的决策树分类算法.改进算法基于决策树精度与样本的相关性较大以及决策树只能得到局部最优解的事实,通过反复迭代寻找较优样本,从而在不改变决策树分类算法的前提下,得到较好的决策树分类算法.该算法不针对某个决策树,只利用输入和输出的反馈信息进行迭代,因此通用性较好.实验证明,该改进算法与ID3,C4.5算法平均错误率的比值约为0.82:1.22:0.92.Abstract: To raise the accuracy of decision tree classification algorithms,an improved decision tree classification algorithm based on samples selection was proposed by comparing several classical decision tree classification algorithms.This improved algorithm searches better samples through a constantly iterative process based on the facts that the correlation between decision trees’ accuracy and samples is large and decision trees can only get a local optimal solution.As a result,a better decision tree classification algorithm can be obtained under the condition of not changing the decision tree classification algorithm.The improved algorithm is not aiming at a decision tree and it carries through iteration only based on some feedback information of input and output,so its universality is better.Experimental results show that the ratio of the average error rates of the improved algorithm and the ID3,C4.5 algorithms is about 0.82 to 1.22 to 0.92.
-
Key words:
- decision tree /
- samples selection /
- ID3 algorithm /
- entropy /
- classification
-
BREIMAN L,FRIEDMAN J H,OLSHEN R A,et al.Classification and regression trees[M].Belmont:Wadsworth International,1984.[2] QUINLAN J R.Induction of decision tree[J].Machine Learning,1986,1(1):81-106.[3] QUINLAN J R.Simplifying decision trees[J].International Journal of Man-Machine Studies,1987,27:221-234.[4] AMIR B O,DANIEL K,ASSAF S,et al.Hierarchical decision tree induction in distributed genomic databases[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(8):1138-1151.[5] DUDA R O,HART P E,STORK D G.模式分类[M].第2版.李宏东,姚天翔,程敏译.北京:机械工业出版社,2005:318-333.[6] HAN Jiawei,KAMBER M.数据挖掘概念与技术[M].范明,孟小峰译.北京:机械工业出版社,2006:185-196.[7] QUINLAN J R.C4.5:programs for machine learning[M].San Francisco:Morgan Kaufmann Publishers Inc,1993.[8] RUGGIERI S.Efficient C4.5[J].IEEE Transactions on Knowledge and Data Engineering,2002,14(2):438-444.[9] 郭玉滨.一种基于离散度的决策树改进算法[J].山东师范大学学报(自然科学版),2006,21(3):129-131.GUO Yubin.An improved decision tree algorithm based on dispersed degree[J].Journal of Shandong Normal University(Natural Science),2006,21(3):129-131.[10] 刘鹏.一种健壮有效的决策树改进模型[J].计算机工程与应用,2005,33:172-175.LIU Peng.A robust and effective decision tree improved model[J].Computer Engineering and Applications,2005,33:172-175.[11] 韩慧,毛锋,王文渊.数据挖掘中决策树算法的最新进展[J].计算机应用研究,2004 (12):5-8.HAN Hui,MAO Feng,WANG Wenyuan.Review of recent development in decision tree algorithm in data mining[J].Application Research of Computers,2004(12):5-8.
点击查看大图
计量
- 文章访问数: 1522
- HTML全文浏览量: 60
- PDF下载量: 606
- 被引次数: 0