• ISSN 0258-2724
  • CN 51-1277/U
  • EI Compendex
  • Scopus 收录
  • 全国中文核心期刊
  • 中国科技论文统计源期刊
  • 中国科学引文数据库来源期刊

基于语义相似度的话题关联检测方法

翟东海 崔静静 聂洪玉 杜佳

翟东海, 崔静静, 聂洪玉, 杜佳. 基于语义相似度的话题关联检测方法[J]. 西南交通大学学报, 2015, 28(3): 517-522. doi: 10.3969/j.issn.0258-2724.2015.03.021
引用本文: 翟东海, 崔静静, 聂洪玉, 杜佳. 基于语义相似度的话题关联检测方法[J]. 西南交通大学学报, 2015, 28(3): 517-522. doi: 10.3969/j.issn.0258-2724.2015.03.021
ZHAI Donghai, CUI Jingjing, NIE Hongyu, DU Jia. Topic Link Detection Method Based on Semantic Similarity[J]. Journal of Southwest Jiaotong University, 2015, 28(3): 517-522. doi: 10.3969/j.issn.0258-2724.2015.03.021
Citation: ZHAI Donghai, CUI Jingjing, NIE Hongyu, DU Jia. Topic Link Detection Method Based on Semantic Similarity[J]. Journal of Southwest Jiaotong University, 2015, 28(3): 517-522. doi: 10.3969/j.issn.0258-2724.2015.03.021

基于语义相似度的话题关联检测方法

doi: 10.3969/j.issn.0258-2724.2015.03.021
基金项目: 

国家语委十二五科研规划资助项目(YB125-49)

教育部科学技术研究重点项目(212167)

中央高校基本科研业务费专项资金资助项目(SWJTU12CX096)

国家级大学生创新创业训练计划资助项目(201210694017)

详细信息
    作者简介:

    翟东海(1974-),男,副教授,博士,研究方向为海量数据挖掘、数字图像处理,E-mail:dhzhai@swjtu.edu.cn

Topic Link Detection Method Based on Semantic Similarity

  • 摘要: 为有效识别任意两篇报道的相似性,提出了一种基于语义相似度的话题关联检测算法.该算法首先通过计算特征词之间的相对熵作为两篇报道中特征词之间的语义相似度;其次,通过计算平均语义相似度获得特征词和报道之间的关联度;最后,结合特征词在语料库中的TF-IF(term frequency-inverse document frequency)权重计算两篇报道之间的关联度,实现报道之间的关联度检测.本文提出的方法与现有的向量空间模型方法和仅依赖于平均点互信息的方法进行了比较,并通过TDT4中文语料进行测评,结果表明,基于语义相似度的关联检测方法能够更好地利用文本的语境信息,提高了现有检测系统的性能,其最小DET(detection error tradeoff)代价降低了3%.

     

  • 洪宇,张宇,刘挺,等. 话题检测与跟踪的评测及研究综述
    [J]. 中文信息学报,2007,21(6): 71-87. HONG Yu, ZHANG Yu, LIU Ting, et al. Topic detection and tracking review
    ALLAN J, LAVRENKO V, MALIN D, et al. Detections, bounds and timelines: UMASS and TDT-3
    KUMARAN G, ALLAN J. Text classification and named entities for new event detection
    [J]. Journal of Chinese Information Processing, 2007, 21(6): 71-87.
    贾真,何大可,尹红风,等. 基于无监督学习的部分-整体关系获取
    [C]//Proceedings of Topic Detection and Tracking(TDT-3). Vienna:, 2000: 167-174.
    庞海杰. 基于动态共现的中文话题关联检测
    杨玉珍,刘培玉,费绍栋,等. 融合扩展信息瓶颈理论的话题关联检测方法研究
    [C]//Proc. of the SIGIR 2004. New York: Association for Computing Machinery Press, 2004: 297-304.
    CHEN Y J, CHEN H H, NLP I R. Approaches to monolingual and multilingual link detection
    SHAH C, EGUCHI K. Improving document representation for story link detection by modeling term topicality
    [J]. 西南交通大学学报,2014,49(4): 590-596. JIA Zhen, HE Dake, YIN Hongfeng, et al. Acquisition of part-whole relations based on unsupervised learning
    DAGAN I, MARCUS S, MARKOVITCH S. Contextual word similarity and estimation from sparse data
    袁里驰. 一种基于互信息的词聚类算法
    [J]. Journal of Southwest Jiaotong University, 2014, 49(4): 590-596.
    龙志祎,程葳. 基于词聚类的热点话题检测算法
    [J]. 计算机应用与软件,2012,29(3): 115-117. PANG Haijie. Chinese story link detection based on dynamic co-occurrance
    CHEN P I, LIN S J. Word Ad-Hoc network: using Google core distance to extract the most relevant information
    PAN Y, LUO H X, TANG Y, et al. Learning to rank with document ranks and scores
    [J]. Computer Applications and Software, 2012, 29(3): 115-117.
    BURGESS C, LIVESAY K, LUND K. Explorations in context space: words, sentences, discourse
    SONG D, BRUZA P D. Towards context sensitive information inference
    [J]. 自动化学报,2014,40(3): 471-479. YANG Yuzhen, LIU Peiyu, FEI Shaodong, et al. A topic link detection method based on improved information bottleneck theory
    BAI J, SONG D, BRUZA P, et al. Query expansionusing term relationships in language models for information retrieval
    [J]. Acta Automatica Sinica, 2014, 40(3): 471-479.
    YU L C, WU C H, YEH J F, et al. HAL-based evolutionary inference for pattern induction from psychiatry Web resources
    [C]//Proceedings of the 19th International Conference on Computational Linguistics-Volume 1. Taipei: Association for Computational Linguistics, 2002: 1-7.
    BUDANITSKY A, HIRST G. Evaluating word net-based measures of lexical semantic relatedness
    KULLBACK S. Information theory and statistics
    [J]. Information and Media Technologies, 2009, 4(2): 433-441.
    HIJAZI M H A, COENEN F, ZHENG Y. Data mining techniques for the screening of age-related macular degeneration
    [C]//Proceedings of the 31st Annual Meeting on Association for Computational Linguistics. Morristown: Association for Computational Linguistics, 1993: 164-171.
    [J]. 系统工程,2008,26(5): 120-122. YUAN Lichi. A word clustering method based onmutual information
    [J]. Systems Engineering, 2008, 26(5): 120-122.
    [J]. 计算机工程与设计,2011(6): 60-84. LONG Zhiyi, CHENG Wei. Kind of hot topic detection algorithm based on clustering keywords
    [J]. Computer Engineering and Design, 2011(6): 60-84.
    [J]. Knowledge-Based Systems, 2011, 24: 393-405.
    [J]. Knowledge-Based Systems, 2011, 24: 478-483.
    [J]. Discourse Processes, 1998, 25(2/3): 211-257.
    [J]. Journal of the American Society for Information Science and Technology, 2003, 54(4): 321-334.
    [C]//Proc. 14th ACM Int. Conf. Inf. Knowl. Manage. (CIKM'05). Ann Arbor:, 2005: 688-695.
    [J]. IEEE Transactions on Evolutionary Computation, 2008, 12(2): 160-170.
    [J]. Computational Linguistics, 2006, 32(1): 13-47.
    [M]. New York: John-Wiley Sons, 1959: 30-50.
    [J]. Knowledge-Based Systems, 2012, 29: 83-92.
  • 加载中
计量
  • 文章访问数:  859
  • HTML全文浏览量:  65
  • PDF下载量:  632
  • 被引次数: 0
出版历程
  • 收稿日期:  2014-06-30
  • 刊出日期:  2015-06-25

目录

    /

    返回文章
    返回