基于无监督学习的部分-整体关系获取
doi: 10.3969/j.issn.0258-2724.2014.04.005
Acquisition of Part-Whole Relations Based on Unsupervised Learning
-
摘要: 针对面向中文自由文本的部分-整体关系抽取问题,提出一种基于无监督学习的方法. 首先提出子模式提取算法,从领域文本集中获取概念对和概念对所在上下文模式,利用概念对和概念对上下文模式建立分布式语义模型;然后采用协同聚类算法将具有相同语义关系的概念对聚合成簇,通过训练L1正则化逻辑回归模型提取簇的特征并得到代表每个簇语义关系的概念对上下文模式;最后根据模式识别表达部分-整体关系的簇,从而获取部分-整体关系概念对. 实验结果表明,该方法取得较好的性能,F度量达到68.97%,优于传统聚类方法(55.77%)和模式匹配方法(61.95%).Abstract: An unsupervised learning method was proposed to solve the problem of part-whole relation extraction from Chinese free texts. A subsequence extraction algorithm was firstly introduced that can acquire concept pairs and their context patterns from domain texts, and a distributional semantic model was constructed according to concept pairs and context patterns of concept pairs. Then a co-clustering algorithm was applied to group the concept pairs with the same semantic relations together. L1 regularized logistic regression model was trained to select clustering feature and obtain the context pattern which represents semantic relation of each cluster. At last, according to the patterns, the clusters expressing part-whole relation were identified and part-whole relation concept pairs were acquired. The experimental results indicate the proposed method is effective and its F measure is up to 68.97% which is superior to the traditional clustering (55.77%) and pattern matching methods(61.95%).
-
Key words:
- ontology /
- unsupervised learning /
- part-whole relation /
- distributional semantic model /
- co-clustering
-
WINSTON M E, CHAFFIN R, HERRMANN D. A taxonomy of part-whole relations[J]. Cognitive Sciences, 1987, 11(4): 417- 444. GERSTL P, PRIBBENOW S. Midwinters, end games, and body parts: A classification of part-whole relations[J]. International Journal of Human Computer Studies, 1995, 43(5/6): 865-890. ODELL J. Six different kinds of composition[J]. Journal of Object-Oriented Programming, 1994, 5(8): 10-15. KEET C M, ARTALE A. Representing and reasoning over a taxonomy of part whole relations[J]. Applied Ontology, 2008, 3(1): 91-110. IRIS M, LUTOWITZ B, EVENS M. Relational models of the lexicon[M]. Cambridge: Cambridge University Press, 1989: 261-288. GIRJU R, BADULESCU A, MOLDOVAN D. Automatic discovery of part whole relations[J]. Computational Linguistics, 2006, 32(1): 83-135. WILLEM R H, KOLB H, SCHREIBER G. A method for learning part whole relations[C]//Proc. of the 5th International Semantic Web Conference. Athens: Springer's, 2006: 723-735. PANTEL P, PENNACCHIOTTI M. Espresso: leveraging generic patterns for automatically harvesting semantic relations[C]//Proc. of COLING/ACL-06 Joint Conference. Sydney:[s.n], 2006: 113-120. ITTOO A, BOUMA G. Minimally-supervised extraction of domain-specific part-whole relations using Wikipedia as knowledge-base[J]. Data & Knowledge Engineering, 2013, 85(5): 57-79. 曹馨宇,曹存根. 从Web获取部分整体关系语料的方法[J]. 中文信息学报,2011,25(5): 17-23. CAO Xinyu, CAO Cungen. A method for acquiring corpus rich in part-whole relation from the Web[J]. Journal of Chinese Information Processing, 2011, 25(5): 17-23. 曹馨宇,曹存根,吴昱明. 从Web中获取部分整体关系[J]. 中文信息学报,2013,27(2): 26-33. CAO Xinyu, CAO Cungen, WU Yuming. Acquiring part-whole relation from the Web[J]. Journal of Chinese Information Processing, 2013, 27(2): 26-33. LIN Dekang. Automatic retrieval and clustering of similar words[C]//Proc. of COLING/ACL-98 Joint Conference. Quebec:[s.n], 1998: 768-774. WEEDS J, WEIR D, MCCARTHY D. Characterising measures of lexical distributional similarity[C]//Proc. of COLING-04 Conference. Geneva:[s.n], 2004: 1015. YU L C, CHAN C L, LIN Chaocheng, et al. Mining association language patterns using a distributional semantic model for negative life event classification[J]. Journal of Biomedical Informatics, 2011, 44(4): 509-518. HARRIS Z. Distributional structure[J]. Word, 1954, 10(2/3): 146-162. GU Quanquan, ZHOU Jie. Co-clustering on manifolds[C]//Proc. of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris:[s.n], 2009: 359-367. BOLLEGALA D, MATSUO Y, ISHIZUKA M. Relational Duality: unsupervised extraction of semantic relations between entities on the Web[C]//Proc. of WWW'10. Raleigh:[s.n], 2010: 151-160. HEARST M A. Automatic acquisition of hyponyms from large text corpora[C]//Proc. of COLING-92. Nantes:[s.n], 1992: 539-545. GALEN A, GAO Jianfeng. Scalable training of L1-regularized log-linear models[C]//Proc. of ICML-07. Corvallis:[s.n], 2007: 33-40.
点击查看大图
计量
- 文章访问数: 941
- HTML全文浏览量: 71
- PDF下载量: 597
- 被引次数: 0