• ISSN 0258-2724
  • CN 51-1277/U
  • EI Compendex
  • Scopus
  • Indexed by Core Journals of China, Chinese S&T Journal Citation Reports
  • Chinese S&T Journal Citation Reports
  • Chinese Science Citation Database
Volume 26 Issue 1
Jan.  2013
Turn off MathJax
Article Contents
JU Chunhua, LI Yaolin. Outlier Detection Model for Data Streams Based on Attribute Associations and Match Difference Degree[J]. Journal of Southwest Jiaotong University, 2013, 26(1): 107-115. doi: 10.3969/j.issn.0258-2724.2013.01.017
Citation: JU Chunhua, LI Yaolin. Outlier Detection Model for Data Streams Based on Attribute Associations and Match Difference Degree[J]. Journal of Southwest Jiaotong University, 2013, 26(1): 107-115. doi: 10.3969/j.issn.0258-2724.2013.01.017

Outlier Detection Model for Data Streams Based on Attribute Associations and Match Difference Degree

doi: 10.3969/j.issn.0258-2724.2013.01.017
  • Received Date: 20 Nov 2011
  • Publish Date: 25 Feb 2013
  • In order to solve the problem of outlier detection for categorical data streams, an outlier detection model for data streams based on attribute associations and match difference degree was proposed, called as AAMDD. This model builds an association rule library off-line and updates it with the incremental method. Meanwhile, it maintains the data streams by using time-sensitive sliding windows (TimeSW). In a time step, the AAMDD matches data in current window with association rules in the association rule library and calculates the match difference degree (MDD). Then, outliers can be identified on-line through different MDDs. An algorithm for the AAMDD was given, called as AAMDD-algorithm. The experiment results show that compared with the FODFP-Stream algorithm, the AAMDD-algorithm has on average 5.51%and 37.43%improvements respectively in detection precision and efficiency, and its recall is above 77%. It can be used to detect outliers in transaction data streams.

     

  • loading
  • 李存华,孙志挥. GridOF:面向大规模数据集的高效离群点检测算法[J]. 计算机研究与发展,2003,40(11): 1585-1592. LI Cunhua, SUN Zhihui. GridOF: an efficient outlier detection algorithm for very large datasets[J]. Journal of Computer Research and Development, 2003, 40(11): 1585-1592.
    MUTHUKRISHNAN S, SHAH R, VETTER J S. Mining deviants in time series data stream[C]//Proceedings of the 16th International Conference on Scientific and Statistical Database Management. Los Alamitos: IEEE Computer Society Press, 2004: 41-50.
    ANGIULLI F, FASSETTI F. Detecting distance-based outliers in streams of data[C]//Proceedings of the 60th ACM Conference on Information and Knowledge Management. New York: ACM, 2007: 811-820.
    POKRAJAC D, LAZAREVIC A, LATECKI L J. Incremental local outlier detection for data streams[C]//IEEE Symposium on Computational Intelligence and Data Mining. [S.l.]: IEEE, 2007: 504-515.
    ZHU Xingquan, WU Xindong, YANG Ying. Effective classification of noisy data streams with attribute oriented dynamic classifier selection[J]. Knowledge and Information Systems, 2006, 9(3): 339-363.
    LI Peipei, HU Xuegang, LIANG Qianhui, et al. Concept drifting detection on noisy streaming data in random ensemble decision trees[C]// Proceedings of the 6th International Conference on Machine Learning and Data Mining. Berlin: Lecture Notes in Computer Science(LNCS), 2009: 236-250.
    CHAN P K, MAHONEY M V, ARSHAD M H. A machine learning approach to anomaly detection. Melbourne: Florida Institute of Technology, 2003: 1-13.
    DAS K, SCHNEIDER J G. Detecting anomalous records in categorical datasets[C]//Proceedings of the 13th International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2007: 220-229.
    NARITA K, KITAGAWA H. Detecting outliers in categorical record databases based on attribute associations[C]//Proceedings of the 10th Asia-Pacific Web Conference on Progress in WWW Research and Development. Heidelberg: Springer-Verlag, 2008: 111-123.
    江峰,杜军威,葛艳,等. 基于粗糙集理论的序列离群点检测[J]. 电子学报,2011,39(2): 345-350. JIANG Feng, DU Junwei, GE Yan, et al. Sequence outlier detection based on rough set theory[J]. Acta Electronica Sinica, 2011, 39(2): 345-350.
    苏晓珂,兰洋. 一种高效混合属性离群检测算法[J]. 小微型计算机系统,2010,31(11): 2282-2286. SU Xiaoke, LAN Yang. Efficient outlier detection algorithm for mixed attributes[J]. Journal of Chinese Computer System, 2010, 31(11): 2282-2286.
    周晓云,孙志挥,张柏礼,等. 高维类别属性数据流离群点快速检测算法[J]. 软件学报,2007,18(4): 933-942. ZHOU Xiaoyun, SUN Zhihui, ZHANG Baili, et al. A fast outlier detection algorithm for high dimensional categorical data streams[J]. Journal of Software, 2007, 18(4): 933-942.
    徐雪松,李玲娟,郭立玮. 基于稀疏表示的数据流异常数据预测方法[J]. 计算机应用,2010,30(11): 2956-2959. XU Xuesong, LI Lingjuan, GUO Liwei. Prediction method of outliers over data stream based on sparse representation[J]. Journal of Computer Applications, 2010, 30(11): 2956-2959.
    李文忠,左万利,赫枫龄. 一种基于信息熵的多维流数据噪声检测算法[J]. 计算机科学,2012,39(2): 191-194. LI Wenzhong, ZUO Wanli, HE Fengling. Entrop-based algorithm for noise detection in multi-dimensional stream data[J]. Computer Science, 2012, 39(2): 191-194.
    GIANNELLA C, HAN J W, PEI J, et al. Mining frequent patterns in data streams at multiple time granularities[C]//Proceedings of NSF Workshop on Next Generation Data Mining. Cambridge: MIT Press, 2002: 191-212.
    LI H F, LEE S Y. Mining frequent itemsets over data streams using efficient window sliding techniques[J]. Expert Systems with Applications, 2009, 36(2): 1466-1477.
    TSAI P S M. Mining frequent itemsets in data streams using the weighted sliding window model[J]. Expert Systems with Applications, 2009, 36(9): 11617-11625.
    AGRAWAL R, SRIKANT R. Fast algorithm for mining association rules[C]//Proceedings of the 20th International Conference on Very Large Data Bases. San Francisco: Morgan Kaufmann Publishers, 1994: 487-499.
    GUO Deke, WU Jie, CHEN Honghui, et al. The dynamic bloom filters[C]//IEEE Transactions on Knowledge and Data Engineering, 2010, 22(1): 120-133.
    BRIJS T, SWINNEN G, VANHOOF K, et al. Using association rules for product assortment decisions: a case study[C]//Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, San Diego: [s.n.], 1999: 254-260.(中文编辑:唐 晴 英文编辑:付国彬)
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索
    Article views(983) PDF downloads(357) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return