基于图的频繁闭项集挖掘算法
Graph-Based Algorithm for Mining Frequent Closed Itemsets
-
摘要: 为了提高数据挖掘效率,提出了一种基于图的频繁闭项集挖掘算法GFCG(graph-based frequent closed itemset generation).该算法采用位矢量技术构造有向图,表示项与项之间的频繁关系,并在有向图的基础上递归 产生频繁闭项集,从而只需扫描数据库2次,不产生候选集;引入扩展频繁项集的概念,大大减小了检查频繁项 集是否闭的搜索空间.用1个真实数据库和2个合成数据库对GFCG进行了测试,并与A-close和CLOSET算法 的结果进行了比较,结果表明,该算法具有良好的速度和可伸缩性性能.Abstract: To raise the efficiency of data mining, a graph-based algorithm for mining frequent closed itemsets, called GFCG (graph-based frequent closed itemset generation) was proposed. In this algorithm, the bit vector technique is used to construct a directed graph to represent the frequent relationship between items, and frequent closed itemsets are generated recursively from this graph. As a result, the GFCG scans the database for only two times, and does not generate candidate sets. Furthermore, the concept of an expanded frequent itemset is introduced to greatly decrease the searching range for adjusting whether a frequent itemset is closed or not. In addition, this algorithm was tested by using one actual and two synthetical databases. The testing result shows that compared with the A-close and CLOSET algorithms, the proposed algorithm has good speed and scale-up properties.
-
Key words:
- database /
- graph /
- data mining /
- frequent closed itemset /
- bit vector
点击查看大图
计量
- 文章访问数: 1368
- HTML全文浏览量: 72
- PDF下载量: 76
- 被引次数: 0