• ISSN 0258-2724
  • CN 51-1277/U
  • EI Compendex
  • Scopus 收录
  • 全国中文核心期刊
  • 中国科技论文统计源期刊
  • 中国科学引文数据库来源期刊

一种半监督的汉语词义消歧方法

张春祥 徐志峰 高雪瑶

张春祥, 徐志峰, 高雪瑶. 一种半监督的汉语词义消歧方法[J]. 西南交通大学学报, 2019, 54(2): 408-414. doi: 10.3969/j.issn.0258-2724.20170178
引用本文: 张春祥, 徐志峰, 高雪瑶. 一种半监督的汉语词义消歧方法[J]. 西南交通大学学报, 2019, 54(2): 408-414. doi: 10.3969/j.issn.0258-2724.20170178
ZHANG Chunxiang, XU Zhifeng, GAO Xueyao. Semi-Supervised Method for Chinese Word Sense Disambiguation[J]. Journal of Southwest Jiaotong University, 2019, 54(2): 408-414. doi: 10.3969/j.issn.0258-2724.20170178
Citation: ZHANG Chunxiang, XU Zhifeng, GAO Xueyao. Semi-Supervised Method for Chinese Word Sense Disambiguation[J]. Journal of Southwest Jiaotong University, 2019, 54(2): 408-414. doi: 10.3969/j.issn.0258-2724.20170178

一种半监督的汉语词义消歧方法

doi: 10.3969/j.issn.0258-2724.20170178
基金项目: 国家自然科学基金资助项目(61502124,60903082);中国博士后科学基金资助项目(2014M560249);黑龙江省自然科学基金资助项目(F201420,F2015041)
详细信息
    作者简介:

    张春祥(1974—),男,教授,博士,研究方向为自然语言处理与计算机图形学,E-mail:z6c6x666@163.com

  • 中图分类号: TP391.2

Semi-Supervised Method for Chinese Word Sense Disambiguation

  • 摘要: 为了解决自然语言处理领域中的一词多义问题,本文提出了一种利用多种语言学知识和词义消歧模型的半监督消歧方法. 首先,以歧义词汇左、右邻接词单元的词形、词性和译文作为消歧特征,来构建贝叶斯 (Bayes) 词义分类器,并以歧义词汇左、右邻接词单元的词形和词性作为消歧特征,来构建最大熵 (maximum entropy,ME) 词义分类器;其次,采用Co-Training算法并结合大量无标注语料来优化词义消歧模型;再次,进行了优化实验,在实验中,使用SemEval-2007:Task#5的训练语料和哈尔滨工业大学的无标注语料来优化贝叶斯分类器和最大熵分类器;最后,对优化后的词义消歧模型进行测试. 测试结果表明:与基于支持向量机 (support vector machine,SVM) 的词义消歧方法相比,本文所提出方法的消歧准确率提高了0.9%. 词义消歧的性能有所提高.

     

  • 图 1  消歧特征的提取

    Figure 1.  Extracting disambiguation features

    表  1  特征函数的值

    Table  1.   Values of feature functions

    SiFfeaturefjSiFfeature
    子女1(j = 1)
    子女v1(j = 2)
    子女中华1(j = 3)
    子女nz1(j = 4)
    子女1(j = 5)
    子女u1(j = 6)
    子女共同1(j = 7)
    子女b1(j = 8)
    其它情况fjSiFfeature) = 0, j = $1{\simfont\text{,}}\!\!\!2{\simfont\text{,}}\!\!\!\cdots{\simfont\text{,}}\!\!\!8$
    下载: 导出CSV

    表  2  测试语料的消歧准确率

    Table  2.   Disambiguation accuracy of test corpus

    词汇实验1实验2实验3
    48.072.084.0
    85.050.050.0
    旗帜72.283.383.3
    动摇75.075.076.5
    镜头66.760.060.0
    使81.375.087.5
    100.069.269.2
    长城76.261.961.9
    成立55.663.066.7
    队伍54.540.940.9
    61.155.661.1
    天地84.080.080.0
    表面50.061.161.1
    55.638.950.0
    单位88.276.576.5
    儿女45.0100.0100.0
    机组100.0100.0100.0
    气象93.881.381.3
    震惊71.492.992.9
    中医93.893.893.8
    平均准确率72.971.573.8
    下载: 导出CSV
  • 王李冬,张引,吕明琪. 基于词组主题建模的文本语义压缩算法[J]. 西南交通大学学报,2015,50(4): 755-763. doi: 10.3969/j.issn.0258-2724.2015.04.027

    WANG Lidong, ZHANG Yin, LÜ Mingqi. Document semantic compression algorithm based on phrase topic model[J]. Journal of Southwest Jiaotong University, 2015, 50(4): 755-763. doi: 10.3969/j.issn.0258-2724.2015.04.027
    翟东海,崔静静,聂洪玉,等. 基于语义相似度的话题关联检测方法[J]. 西南交通大学学报,2015,50(3): 517-522. doi: 10.3969/j.issn.0258-2724.2015.03.021

    ZHAI Donghai, CUI Jingjing, NIE Hongyu, et al. Topic link detection method based on semantic similarity[J]. Journal of Southwest Jiaotong University, 2015, 50(3): 517-522. doi: 10.3969/j.issn.0258-2724.2015.03.021
    杨陟卓,黄河燕. 基于语言模型的有监督词义消歧模型优化研究[J]. 中文信息学报,2014,28(1): 19-25. doi: 10.3969/j.issn.1003-0077.2014.01.003

    YANG Zhizhuo, HUANG Heyan. Supervised WSD model optimization based on language model[J]. Journal of Chinese Information Processing, 2014, 28(1): 19-25. doi: 10.3969/j.issn.1003-0077.2014.01.003
    JUDITA P. DALE: a word sense disambiguation system for biomedical documents trained using automatically labeled examples[C]//Proceedings of the NAACL HLT 2013 Demonstration Session. Atlanta: Association for Computational Linguistics, 2013: 1-4
    RAGANATO A. Neural sequence learning models for word sense disambiguation[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen: Association for Computational Linguistics, 2017: 1167-1178
    IACOBACCI I, PILEHVAR M T, NAVIGLI R. Embeddings for word sense disambiguation: an evaluation study[C]//Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin: Association for Computational Linguistics, 2016: 897-907
    SHINNOU H, SASAKI M, KOMIYA K. Learning under covariate shift for domain adaptation for word sense disambiguation[C]//Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. Shanghai: Shanghai Jiaotong University, 2015: 215-223
    郭瑛媚,史晓东,陈毅东,等. 基于话题分布相似度的无监督评论词消歧方法[J]. 北京大学学报,2013,49(1): 95-101.

    GUO Yingmei, SHI Xiaodong, CHEN Yidong, et al. Unsupervised opinion word disambiguation based on topic distribution similarity[J]. Acta Scientiarum Naturalium Universitatis Pekinensis, 2013, 49(1): 95-101.
    李旭,刘国华,张东明. 一种改进的汉语全文无指导词义消歧方法[J]. 自动化学报,2010,36(1): 184-187.

    LI Xu, LIU Guohua, ZHANG Dongming. An improved word sense disambiguation method for Chinese full-words based on unsupervised learning[J]. Acta Automatica Sinica, 2010, 36(1): 184-187.
    SUNNY M, RITWIK M, MARTIN R, et al. That's sick dude!: automatic identification of word sense change across different timescales[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore: Association for Computational Linguistics, 2014: 1020-1029
    KOUNO K, SHINNOU H, SASAKI M, et al. Unsupervised domain adaptation for word sense disambiguation using stacked denoising autoencoder[C]//Proceedings of the 29th Pacific Asia Conference on Language, Information and Computation. Shanghai: Shanghai Jiaotong University, 2015: 224-231
    PANCHENKO A, MARTEN F, RUPPERT E, et al. Unsupervised, knowledge-free, and interpretable word sense disambiguation[C]//Proceedings of the 2017 EMNLP System Demonstrations. Copenhagen: Association for Computational Linguistics, 2017: 91-96
    CEM A, JANYCE W, RADA M, et al. Iterative constrained clustering for subjectivity word sense disambiguation[C]//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics. Gothenburg: Association for Computational Linguistics, 2014: 269-278
    KAVEH T, HWEE T N. Semi-supervised word sense disambiguation using word embeddings in general and specific domains[C]//Human Language Technologies: the 2015 Annual Conference of the North American Chapter of the ACL. Denver: Association for Computational Linguistics, 2015: 314-323
    KAVEH T, HWEE T N. One million sense-tagged instances for word sense disambiguation and induction[C]//Proceedings of the 19th Conference on Computational Language Learning. Beijing: Association for Computational Linguistics, 2015: 338-344
    鹿文鹏,黄河燕,吴昊. 基于领域知识的图模型词义消歧方法[J]. 自动化学报,2014,40(12): 2836-2850.

    LU Wenpeng, HUANG Heyan, WU Hao. Word sense disambiguation with graph model based on domain knowledge[J]. Acta Automatica Sinica, 2014, 40(12): 2836-2850.
    PERSHINA M. Personalized page rank for named entity disambiguation[C]//Human Language Technologies: the 2015 Annual Conference of the North American Chapter of the ACL. Denver: Association for Computational Linguistics, 2015: 238-243
    RICHARD J, LUIS N P. Combining relational and distributional knowledge for word sense disambiguation[C]//Proceedings of the 20th Nordic Conference of Computational Linguistics. Vilnius: Linköping University Electronic Press, 2015: 69-78
    IVAN L A. Improving selection of synsets from Wordnet for domain-specific word sense disambiguation[J]. Computer Speech and Language, 2017, 41(1): 128-145.
    JIN P, WU Y F, YU S W. SemEval-2007 task 5: multilingual Chinese-English lexical sample task[C]//Proceedings of the 4th International Workshop on Semantic Evaluations. Prague: Association for Computational Linguistics, 2007: 19-23
  • 加载中
图(1) / 表(2)
计量
  • 文章访问数:  453
  • HTML全文浏览量:  255
  • PDF下载量:  7
  • 被引次数: 0
出版历程
  • 收稿日期:  2017-03-14
  • 修回日期:  2018-01-08
  • 网络出版日期:  2018-03-06
  • 刊出日期:  2019-04-01

目录

    /

    返回文章
    返回