• ISSN 0258-2724
  • CN 51-1277/U
  • EI Compendex
  • Scopus 收录
  • 全国中文核心期刊
  • 中国科技论文统计源期刊
  • 中国科学引文数据库来源期刊

基于属性关联及匹配差异度的数据流异常检测

琚春华 李耀林

琚春华, 李耀林. 基于属性关联及匹配差异度的数据流异常检测[J]. 西南交通大学学报, 2013, 26(1): 107-115. doi: 10.3969/j.issn.0258-2724.2013.01.017
引用本文: 琚春华, 李耀林. 基于属性关联及匹配差异度的数据流异常检测[J]. 西南交通大学学报, 2013, 26(1): 107-115. doi: 10.3969/j.issn.0258-2724.2013.01.017
JU Chunhua, LI Yaolin. Outlier Detection Model for Data Streams Based on Attribute Associations and Match Difference Degree[J]. Journal of Southwest Jiaotong University, 2013, 26(1): 107-115. doi: 10.3969/j.issn.0258-2724.2013.01.017
Citation: JU Chunhua, LI Yaolin. Outlier Detection Model for Data Streams Based on Attribute Associations and Match Difference Degree[J]. Journal of Southwest Jiaotong University, 2013, 26(1): 107-115. doi: 10.3969/j.issn.0258-2724.2013.01.017

基于属性关联及匹配差异度的数据流异常检测

doi: 10.3969/j.issn.0258-2724.2013.01.017
基金项目: 

国家自然科学基金资助项目(71071141)

浙江省自然科学基金重点项目(Z1091224)

教育部博士点基金资助项目(20103326110001)

Outlier Detection Model for Data Streams Based on Attribute Associations and Match Difference Degree

  • 摘要: 为解决类别属性数据流异常点检测问题,针对事务数据流环境,提出了基于属性关联及匹配差异度的数据流异常检测模型AAMDD(attribute associations and match difference degree).AAMDD模型离线构建一个关联规则库,并对其进行增量式更新.同时,利用时间敏感型滑动窗口(time-sensitive sliding windows,TimeSW)维护数据流数据,每经过一个时间跨度,就将当前窗口中每条数据包含的项集与关联规则库进行匹配,计算匹配差异度,根据匹配差异度的不同在线检测异常点.此外,给出了与AAMDD模型相对应的算法AAMDD-algorithm.实验结果表明,AAMDD-algorithm比FODFP-Stream算法的效率和检测精确度分别平均提高了37.43%和5.51%,并且AAMDD-algorithm的查全率保持在77%以上,可用于事务型数据流异常检测.

     

  • 李存华,孙志挥. GridOF:面向大规模数据集的高效离群点检测算法[J]. 计算机研究与发展,2003,40(11): 1585-1592. LI Cunhua, SUN Zhihui. GridOF: an efficient outlier detection algorithm for very large datasets[J]. Journal of Computer Research and Development, 2003, 40(11): 1585-1592.
    MUTHUKRISHNAN S, SHAH R, VETTER J S. Mining deviants in time series data stream[C]//Proceedings of the 16th International Conference on Scientific and Statistical Database Management. Los Alamitos: IEEE Computer Society Press, 2004: 41-50.
    ANGIULLI F, FASSETTI F. Detecting distance-based outliers in streams of data[C]//Proceedings of the 60th ACM Conference on Information and Knowledge Management. New York: ACM, 2007: 811-820.
    POKRAJAC D, LAZAREVIC A, LATECKI L J. Incremental local outlier detection for data streams[C]//IEEE Symposium on Computational Intelligence and Data Mining. [S.l.]: IEEE, 2007: 504-515.
    ZHU Xingquan, WU Xindong, YANG Ying. Effective classification of noisy data streams with attribute oriented dynamic classifier selection[J]. Knowledge and Information Systems, 2006, 9(3): 339-363.
    LI Peipei, HU Xuegang, LIANG Qianhui, et al. Concept drifting detection on noisy streaming data in random ensemble decision trees[C]// Proceedings of the 6th International Conference on Machine Learning and Data Mining. Berlin: Lecture Notes in Computer Science(LNCS), 2009: 236-250.
    CHAN P K, MAHONEY M V, ARSHAD M H. A machine learning approach to anomaly detection. Melbourne: Florida Institute of Technology, 2003: 1-13.
    DAS K, SCHNEIDER J G. Detecting anomalous records in categorical datasets[C]//Proceedings of the 13th International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2007: 220-229.
    NARITA K, KITAGAWA H. Detecting outliers in categorical record databases based on attribute associations[C]//Proceedings of the 10th Asia-Pacific Web Conference on Progress in WWW Research and Development. Heidelberg: Springer-Verlag, 2008: 111-123.
    江峰,杜军威,葛艳,等. 基于粗糙集理论的序列离群点检测[J]. 电子学报,2011,39(2): 345-350. JIANG Feng, DU Junwei, GE Yan, et al. Sequence outlier detection based on rough set theory[J]. Acta Electronica Sinica, 2011, 39(2): 345-350.
    苏晓珂,兰洋. 一种高效混合属性离群检测算法[J]. 小微型计算机系统,2010,31(11): 2282-2286. SU Xiaoke, LAN Yang. Efficient outlier detection algorithm for mixed attributes[J]. Journal of Chinese Computer System, 2010, 31(11): 2282-2286.
    周晓云,孙志挥,张柏礼,等. 高维类别属性数据流离群点快速检测算法[J]. 软件学报,2007,18(4): 933-942. ZHOU Xiaoyun, SUN Zhihui, ZHANG Baili, et al. A fast outlier detection algorithm for high dimensional categorical data streams[J]. Journal of Software, 2007, 18(4): 933-942.
    徐雪松,李玲娟,郭立玮. 基于稀疏表示的数据流异常数据预测方法[J]. 计算机应用,2010,30(11): 2956-2959. XU Xuesong, LI Lingjuan, GUO Liwei. Prediction method of outliers over data stream based on sparse representation[J]. Journal of Computer Applications, 2010, 30(11): 2956-2959.
    李文忠,左万利,赫枫龄. 一种基于信息熵的多维流数据噪声检测算法[J]. 计算机科学,2012,39(2): 191-194. LI Wenzhong, ZUO Wanli, HE Fengling. Entrop-based algorithm for noise detection in multi-dimensional stream data[J]. Computer Science, 2012, 39(2): 191-194.
    GIANNELLA C, HAN J W, PEI J, et al. Mining frequent patterns in data streams at multiple time granularities[C]//Proceedings of NSF Workshop on Next Generation Data Mining. Cambridge: MIT Press, 2002: 191-212.
    LI H F, LEE S Y. Mining frequent itemsets over data streams using efficient window sliding techniques[J]. Expert Systems with Applications, 2009, 36(2): 1466-1477.
    TSAI P S M. Mining frequent itemsets in data streams using the weighted sliding window model[J]. Expert Systems with Applications, 2009, 36(9): 11617-11625.
    AGRAWAL R, SRIKANT R. Fast algorithm for mining association rules[C]//Proceedings of the 20th International Conference on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers, 1994: 487-499.
    GUO Deke, WU Jie, CHEN Honghui, et al. The dynamic bloom filters[C]//IEEE Transactions on Knowledge and Data Engineering, 2010, 22(1): 120-133.
    BRIJS T, SWINNEN G, VANHOOF K, et al. Using association rules for product assortment decisions: a case study[C]//Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, San Diego: [s.n.], 1999: 254-260.(中文编辑:唐 晴 英文编辑:付国彬)
  • 加载中
计量
  • 文章访问数:  983
  • HTML全文浏览量:  71
  • PDF下载量:  357
  • 被引次数: 0
出版历程
  • 收稿日期:  2011-11-20
  • 刊出日期:  2013-02-25

目录

    /

    返回文章
    返回