• ISSN 0258-2724
  • CN 51-1277/U
  • EI Compendex
  • Scopus 收录
  • 全国中文核心期刊
  • 中国科技论文统计源期刊
  • 中国科学引文数据库来源期刊

基于知识图谱的高速列车知识融合方法

王淑营 李雪 黎荣 张海柱

王淑营, 李雪, 黎荣, 张海柱. 基于知识图谱的高速列车知识融合方法[J]. 西南交通大学学报. doi: 10.3969/j.issn.0258-2724.20220193
引用本文: 王淑营, 李雪, 黎荣, 张海柱. 基于知识图谱的高速列车知识融合方法[J]. 西南交通大学学报. doi: 10.3969/j.issn.0258-2724.20220193
WANG Shuying, LI Xue, LI Rong, ZHANG Haizhu. Knowledge Fusion Method of High-Speed Train Based on Knowledge Graph[J]. Journal of Southwest Jiaotong University. doi: 10.3969/j.issn.0258-2724.20220193
Citation: WANG Shuying, LI Xue, LI Rong, ZHANG Haizhu. Knowledge Fusion Method of High-Speed Train Based on Knowledge Graph[J]. Journal of Southwest Jiaotong University. doi: 10.3969/j.issn.0258-2724.20220193

基于知识图谱的高速列车知识融合方法

doi: 10.3969/j.issn.0258-2724.20220193
基金项目: 国家重点研发计划(2020YFB1708000);四川省重大科技专项(2022ZDZX0003)
详细信息
    作者简介:

    王淑营(1974—),女,教授,博士,研究方向为智能制造、工业数据应用,E-mail:w_shuying@126.com

  • 中图分类号: U270;TP391.1

Knowledge Fusion Method of High-Speed Train Based on Knowledge Graph

  • 摘要:

    为解决高速列车各领域知识之间关联不明、难以检索和应用等问题,首先分析高速列车多源异构知识的组织形式,并结合高速列车产品结构树和阶段领域,构建高速列车领域知识图谱模式层和知识图谱;其次,通过BERT-BILSTM-CRF(双向编码变换器-双向长短期记忆网络-条件随机场)模型进行实体识别,得到阶段领域本体的映射;然后,将高速列车实体属性分为结构化和非结构化两类,并分别使用Levenshtein距离和CBOW-BILSTM(连续词袋模型-双向长短期记忆网络)模型计算相应属性的相似度,得到对齐实体对;最后,结合高速列车产品编码结构树进行映射融合,构建高速列车领域融合知识图谱. 应用本文方法对高速列车转向架进行实例验证的结果表明:在命名实体识别方面,基于BERT-BILSTM-CRF模型得到的实体识别准确率为91%;在实体对齐方面,采用Levenshtein 距离、CBOW-BILSTM模型计算实体相似度的F1值(准确率和召回率的调和平均数)分别为82%、83%.

     

  • 图 1  基于知识图谱的高速列车知识融合

    Figure 1.  Knowledge fusion of high-speed train based on knowledge graph

    图 2  高速列车产品结构映射

    Figure 2.  Mapping of high-speed train product structure

    图 3  本体模式构建流程

    Figure 3.  Construction process of ontology pattern

    图 4  领域本体映射流程

    Figure 4.  Mapping process of domain ontology

    图 5  基于属性相似度的实体对齐算法流程

    Figure 5.  Entity alignment algorithm based on attribute similarity

    图 6  语义相似度模型

    Figure 6.  Model of semantic similarity

    图 7  实体映射融合流程

    Figure 7.  Fusion process of entity mapping

    图 8  故障领域本体

    Figure 8.  Ontology of fault domain

    图 9  故障领域知识图谱

    Figure 9.  Knowledge graph of fault domain

    图 10  属性权值测试

    Figure 10.  Attribute weight test

    图 11  相似度阈值取值计算

    Figure 11.  Calculation of similarity threshold value

    图 12  融合本体

    Figure 12.  Fusion ontology

    图 13  设计域和运维域融合知识图谱

    Figure 13.  Fusion knowledge graph of design domain and maintenance domain

    表  1  结构树划分

    Table  1.   Partition of structure trees

    结构树 知识来源 特点分析
    产品族主结构树 产品族模型数据、标准类数据、模板类数据 具有快速重用的特点,不涉及具体的参数值,是设计实例的模板结构,具有元节点编码作为唯一标识
    产品设计结构树 需求数据、几何数据、设计规则、物理属性数据、工艺数据 与设计产出相对应,是按需求设计实例化的结果,具有模块编码作为唯一标识
    产品实例结构树 工艺质量数据、故障数据、制造成本数据 设计实例实物化的结果,与设计实例具有多对一的关系,制造码为唯一标识
    下载: 导出CSV

    表  2  高速列车实体属性

    Table  2.   Entity attribute of high-speed train

    数值型(结构化属性) 文本型(非结构化属性)
    运营速度、转向架最大宽度、转向架最大高度、车轮直径(新轮)、车轮直径(半磨耗)、齿轮中心距、轴重 转向架型式、车轮型式、车轮踏面型式、车轴型式、牵引电机型式、牵引拉杆材料、齿轮箱材料
    下载: 导出CSV

    表  3  单位和约束匹配模板

    Table  3.   Matching template of unit and constraint

    约束单位
    不大于mm
    不大于%
    不得超过L
    ±g
    下载: 导出CSV

    表  4  数据集构成

    Table  4.   Composition of dataset

    数据集实体数关系数可对齐实体数不可对齐实体数
    故障数据132584115289254333
    维修数据105063528289251581
    下载: 导出CSV

    表  5  BERT-BILSTM-CRF模型参数设置

    Table  5.   Parameter setting of BERT-BILSTM-CRF model

    参数名 参数值
    批大小/批 4
    学习率 0.001
    丢失率 0.5
    训练轮次/轮 10
    字向量维度/维 768
    序列长度/个 128
    下载: 导出CSV

    表  6  实体识别对比实验

    Table  6.   Contrast experiment of entity recognition %

    实验方法 准确率 召回率 F1值
    Word2vec-BILSTM 86 83 84
    Word2vec-BILSTM-CRF 90 87 88
    BERT-BILSTM 89 86 87
    BERT-BILSTM-CRF 91 88 89
    下载: 导出CSV

    表  7  相似度计算对比实验

    Table  7.   Contrast experiment of similarity calculation

    相似度计算方法 F1值/%
    Levenshtein距离 82
    Jaro-Winkler距离 79
    语义相似度(CBOW) 77
    语义相似度(BILSTM) 81
    语义相似度(CBOW-BILSTM) 83
    下载: 导出CSV
  • [1] 丁国富,姜杰,张海柱,等. 我国高速列车数字化研发的进展及挑战[J]. 西南交通大学学报,2016,51(2): 251-263. doi: 10.3969/j.issn.0258-2724.2016.02.005

    DING Guofu, JIANG Jie, ZHANG Haizhu, et al. Development and challenge of digital design of high-speed trains in China[J]. Journal of Southwest Jiaotong University, 2016, 51(2): 251-263. doi: 10.3969/j.issn.0258-2724.2016.02.005
    [2] 刘峤,李杨,段宏,等. 知识图谱构建技术综述[J]. 计算机研究与发展,2016,53(3): 582-600.

    LIU Qiao, LI Yang, DUAN Hong, et al. Knowledge graph construction techniques[J]. Journal of Computer Research and Development, 2016, 53(3): 582-600.
    [3] RUTA M, SCIOSCIA F, GRAMEGNA F, et al. A knowledge fusion approach for context awareness in vehicular networks[J]. IEEE Internet of Things Journal, 2018, 5(4): 2407-2419. doi: 10.1109/JIOT.2018.2815009
    [4] ZHAO X J, JIA Y, LI A P, et al. Multi-source knowledge fusion: a survey[C]//2019 IEEE Fourth International Conference on Data Science in Cyberspace (DSC). Hangzhou: IEEE, 2019: 119-127.
    [5] ABDELLATIF M, FARHAN M S, SHEHATA N S. Overcoming business process reengineering obstacles using ontology-based knowledge map methodology[J]. Future Computing and Informatics Journal, 2018, 3(1): 7-28. doi: 10.1016/j.fcij.2017.10.006
    [6] KAUSHIK N, CHATTERJEE N. Automatic relationship extraction from agricultural text for ontology construction[J]. Information Processing in Agriculture, 2018, 5(1): 60-73. doi: 10.1016/j.inpa.2017.11.003
    [7] DAI Z J, WANG X T, NI P, et al. Named entity recognition using BERT BiLSTM CRF for Chinese electronic health records[C]//2019 12th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). Suzhou: IEEE, 2019: 1-5.
    [8] JIANG L, SHI J Y, WANG C Y. Multi-ontology fusion and rule development to facilitate automated code compliance checking using BIM and rule-based reasoning[J]. Advanced Engineering Informatics, 2022, 51: 101449.1-101449.15.
    [9] 王雪鹏,刘康,何世柱,等. 基于网络语义标签的多源知识库实体对齐算法[J]. 计算机学报,2017,40(3): 701-711. doi: 10.11897/SP.J.1016.2017.00701

    WANG Xuepeng, LIU Kang, HE Shizhu, et al. Multi-source knowledge bases entity alignment by leveraging semantic tags[J]. Chinese Journal of Computers, 2017, 40(3): 701-711. doi: 10.11897/SP.J.1016.2017.00701
    [10] TRISEDYA B D, QI J Z, ZHANG R. Entity alignment between knowledge graphs using attribute embeddings[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 297-304. doi: 10.1609/aaai.v33i01.3301297
    [11] ZHU Q, WEI H, SISMAN B, et al. Collective multi-type entity alignment between knowledge graphs[C]//Proceedings of The Web Conference 2020. Taipei: ACM, 2020: 2241–2252.
    [12] ZAD S, HEIDARI M, HAJIBABAEE P, et al. A survey of deep learning methods on semantic similarity and sentence modeling[C]//2021 IEEE 12th Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON). Vancouver: IEEE, 2021: 466-472.
    [13] TSENG C W, CHOU J J, TSAI Y C. Text mining analysis of teaching evaluation questionnaires for the selection of outstanding teaching faculty members[J]. IEEE Access, 2018, 6: 72870-72879. doi: 10.1109/ACCESS.2018.2878478
    [14] ZHANG W T, JIANG S H, ZHAO S, et al. A BERT-BiLSTM-CRF model for Chinese electronic medical records named entity recognition[C]//2019 12th International Conference on Intelligent Computation Technology and Automation (ICICTA). Xiangtan: IEEE, 2019: 166-169.
    [15] ZHANG M Y, WANG J, ZHANG X J. Using a pre-trained language model for medical named entity extraction in Chinese clinic text[C]//2020 IEEE 10th International Conference on Electronics Information and Emergency Communication (ICEIEC). Beijing: IEEE, 2020: 312-317.
    [16] NGUYEN H T, DUONG P H, CAMBRIA E. Learning short-text semantic similarity with word embeddings and external knowledge sources[J]. Knowledge-Based Systems, 2019, 182: 104842.1-104842.9.
    [17] PUTERA UTAMA SIAHAAN A, ARYZA S, HARIYANTO E, et al. Combination of levenshtein distance and rabin-karp to improve the accuracy of document equivalence level[J]. International Journal of Engineering & Technology, 2018, 7: 17-21.
    [18] MANAF K, PITARA S, SUBAEKI B, et al. Comparison of carp Rabin algorithm and jaro-winkler distance to determine the equality of sunda languages[C]//2019 IEEE 13th International Conference on Telecommunication Systems, Services, and Applications (TSSA). Bali: IEEE, 2019: 77-81.
  • 加载中
图(13) / 表(7)
计量
  • 文章访问数:  21
  • HTML全文浏览量:  9
  • PDF下载量:  3
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-03-16
  • 修回日期:  2022-07-04
  • 网络出版日期:  2024-05-29

目录

    /

    返回文章
    返回