基于分布式的频繁闭合模式挖掘算法
doi: 10.3969/j.issn.0258-2724.2012.06.019
Algorithm of Distributed Frequent Closed Patterns Mining
-
摘要: 为提高数据挖掘效率,提出了一种基于分布式的频繁闭合模式挖掘算法——PFCI-Miner.该算法采用任务分布的主从方式,其中主处理器通过发送提出的前缀路径表(PrePthx)将挖掘任务合理划分,而从处理器借助提出的存储树(Trac-tree)挖掘局部频繁闭合模式,最后由主处理器挖掘出全局频繁闭合模式.此外,采用星形拓扑结构,使数据通信只存在于主处理器与从处理器之间,而各从处理器之间无数据通信且不需要同步.在由3台PC机构成的分布式环境下,对合成与蘑菇数据集的实验表明,PFCI-Miner较DP-FP算法、AFCIM算法和DFCIM算法的执行效率分别平均提高了43.66%、42.17%、53.48%和51.86%、47.62%、62.78%.Abstract: In order to improve the mining efficiency, an algorithm, PFCI-Miner, based on distributed frequent closed patterns mining was proposed. This algorithm adopts a master-slave structure to implement task distribution. The master processor assigns a task efficiently by sending a proposed prefix path table (PrePthx), and the slave processors mine local frequent closed patterns with the help of a proposed store tree (Trac-tree). Finally the master processor mines the global frequent closed patterns. The algorithm uses star-like topology in order to make all data communications only between the master processor and the slave processors, there being no communication and no synchronization among all slave processors. Computer simulation on synthesis and mushroom data sets under the distribution of 3 PC computers shows that compared with the DP-FP algorithm, the AFCIM (adaptive frequent closed itemsets mining model) algorithm and the DFCIM (distributed frequent closed itemsets mining) algorithm, the PFCI-Miner algorithm has, on average, 43.66%, 42.17%, 53.48% and 51.86%, 47.62%, 62.78% improvements in the efficiency respectively.
-
Key words:
- association rule /
- data mining /
- frequent closed pattern
-
PASQUIER N, BASTIDE Y, TAOUIL R, et al. Discovering frequent closed itemsets for association rules//Proceedings of the 7th International Conference on Database Theory. Jerusalem: Spring-Verlag, 1999: 398-416. ZHANG Liang, REN Yonggong, FU Yu. New algorithm of mining frequent closed itemsets[J]. Journal of Southeast University, 2008, 24(1): 335-338. LIU Chun, ZHENG Zheng, CAI Kaiyuan, et al. Online mining frequent closed itemsets over data stream[J]. Journal of Beijing University of Aeronautics and Astronautics, 2008, 34(8): 969-972. 胡为成,王本年,程传流. 基于DSCFCI_tree的带项目约束的数据流频繁闭合模式挖掘算法[J]. 中国科学技术大学学报,2009,39(11): 1194-1201. HU Weicheng, WANG Bennian, CHENG Zhuanliu. Algorithm for mining frequent closed patterns with item constraint over data streams based on DSCFCI_tree[J]. Journal of University of Science and Technology of China, 2009, 39(11): 1194-1201. TANG Peiyi, NING Li, WU Ningning. Domain and data partitioning for parallel mining of frequent closed itemsets//Proceedings of the 43th Annual Southeast Regional Conference. Kennesaw: ACM Press, 2005: 250-252. 王黎明,张卓. 基于iceberg概念格并置集成的闭频繁项集挖掘算法[J]. 计算机研究与发展,2007,44(7): 1184-1190. WANG Liming, ZHANG Zhuo. An algorithm for mining closed frequent itemsets based on apposition assembly of iceberg concept lattices[J]. Journal of Computer Research and Development, 2007, 44(7): 1184-1190. 琚春华,倪栋君. 基于元学习的分布式挖掘频繁闭合模式算法研究[J]. 计算机应用研究,2009,26(1): 41-46. JU Chunhua, NI Dongjun. Study of algorithm for distributed mining frequent closed patterns based on meta-learn technology[J]. Application Research of Computers, 2009, 26(1): 41-46. 缪裕青,伊东. 分布式存储结构的频繁闭合模式挖掘并行算法[J]. 微电子学与计算机,2007,24(10): 161-163. MIAO Yuqing, YIN Dong. Parallel algorithm for mining frequent closed patterns on distributed memory multi-processors[J]. Microelectronics & Computer, 2007, 24(10): 161-163. GRAHNE G, ZHU J. Efficiently using prefix-trees in mining frequent itemsets//Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementation. Melbourne: IEEE, 2003: 123-132. HAN J, JIAN P, YIWEN Y. Mining frequent patterns without candidate generation//Proceedings of the ACM SIGMOD International Conference Management of Data. Dallas: Kluwer Academic Publishers, 2000: 1-12. LIU Chun, ZHENG Zheng, CAI Kaiyuan, et al. Distributed frequent closed itemsets mining//Proceedings of the third International IEEE Conference on Signal-Image Technologies and Internet-Based System. Washington: IEEE Computer Society Press, 2008: 43-50. LIN Jianming, JU Chunhua, LIU Dongsheng. An efficient mining model for global frequent closed itemsets//Proceedings of the 2nd International Symposium on Electronic Commerce and Security. Washington: IEEE Computer Society Press, 2009: 278-282. 吴磊,陈鹏. 基于并行计算的关联规则挖掘优化算法[J]. 计算机应用,2005,25(9): 1989-1991. WU Lei, CHEN Peng. Updated algorithm for mining association rules based on parallel computation[J]. Computer Applications, 2005, 25(9): 1989-1991.
点击查看大图
计量
- 文章访问数: 1036
- HTML全文浏览量: 54
- PDF下载量: 472
- 被引次数: 0