基于文本挖掘的道路运输安全风险源辨识模型

罗文慧; 蔡凤田; 吴初娜; 夏鸿文; 孟兴凯

doi:10.3969/j.issn.0258-2724.20200140

基于文本挖掘的道路运输安全风险源辨识模型

doi: 10.3969/j.issn.0258-2724.20200140

基金项目: 交通运输部交通运输行业重点科技项目（2018-C0004）

详细信息

作者简介:
罗文慧（1983—），女，博士，研究方向为道路运输安全风险防控技术，E-mail：wh.luo@rion.cn

通讯作者:
蔡凤田（1963—），男，研究员，研究方向为道路运输安全风险防控技术，E-mail：ft.cai@rioh.cn

中图分类号: U491.14
计量
- 文章访问数: 897
- HTML全文浏览量: 504
- PDF下载量: 42
- 被引次数: 0
出版历程
- 收稿日期: 2020-01-05
- 修回日期: 2020-06-08
- 网络出版日期: 2020-09-15
- 刊出日期: 2021-02-01

Text-Mining Based Risk Source Identification Model for Transportation Safety

摘要

摘要: 为了解决当前道路运输安全风险源辨识工作中数据短缺和人员工作量较大的问题，从文本挖掘的角度出发，提出一种能够自动辨识道路运输过程中安全风险源的模型. 该模型首先对道路运输文本进行因果句提取，并对因果句进行分词操作，实现安全风险源特征的增强；其次，进行适应卷积神经网络（convolutional neural networks，CNN）输入的、包含词信息和位置信息的特征构造；然后，将特征构造的结果输入到CNN实现安全风险源的辨识；最后，利用道路交通事故报告进行实验. 实验结果表明：提出的辨识模型能辨识大部分的道路运输安全风险源因素，准确率约为77.321%.
- 道路运输 /
- 安全风险源辨识 /
- 文本挖掘 /
- 卷积神经网络 /
- 因果句子抽取
Abstract: In order to solve data deficiency and excessive staff workload in the risk-source identification of road transportation safety, an automatic identification model is proposed from the angle of text mining. Firstly, the model performs feature enhancement preprocessing operation through the causality sentence extraction and extracted sentence segmentation. Secondly, the feature construction adapted to the convolutional neural network (CNN) is conducted, which contains word information and position information. Thirdly, the results of feature construction feed into the CNN to realize the identification of risk sources. Finally, experiments are conducted with the data sets of traffic accidents, demonstrating that the proposed model can identify most of risk sources for road transportation safety with the accuracy of about 77.321%.
- road transportation /
- risk-source identification /
- text mining /
- convolutional neural network /
- causality sentence extraction

HTML全文

图 1 风险源辨识步骤示意

Figure 1. Schematic diagram of risk-source identification process

下载: 全尺寸图片幻灯片

图 2 因果句预先处理流程

Figure 2. Preprocessing process of causality sentences

下载: 全尺寸图片幻灯片

图 3 基于CNN的安全风险源辨识模型

Figure 3. Risk-source identification model base on CNN

下载: 全尺寸图片幻灯片

图 4 安全风险源辨识模型多分类ROC曲线评估

Figure 4. Multi-class ROC curves of risk-source identity model

下载: 全尺寸图片幻灯片

表 1 因果提示词清单列举

Table 1. Causality cue words

事故报告	现代汉语语料库
致使、导致、造成、是、加剧了、与、致、继、后、再次、接着、以致、发生	导致、产生、造成、造就、酿成、引发、引起、引来、引致、诱发、致使、滋生、归于、带来、触发、致死、致残、波及、关系到、牵动、渗入、渗透、影响、推动、推进、出于、出自、引导、引入、诱导、指引、诱惑、使得、······

下载: 导出CSV

表 2 因果句子结构列举

Table 2. Causality sentence structures

由因到果	由果到因	分开式
[因] <提示词> [果]	[果] <提示词> [因]	<提示词> [因]，<提示词> [果]
[因][因] …… <提示词> [果]	[果][果] <提示词> [因]	<提示词> [果]，<提示词> [因]
[因] <提示词> [果][果]……	[果] <提示词> [因][因] ……	……

下载: 导出CSV

表 3 因果句子标注列举

Table 3. Causality sentence annotations

句子	风险源辨识因果词汇标注
事故车辆驾驶人王某行经事故地点时超速行驶、疲劳驾驶，致使车辆向道路右侧偏离，正面冲撞秦岭1号隧道洞口端墙	超速行驶：1；疲劳驾驶：2；车辆向道路右侧偏离：11；冲撞隧道洞口端墙：12
驾驶人曾某疲劳驾车上路、超载、超速，因疲劳过度车辆失去控制，与对向摩托车相撞	疲劳驾车：1；超载：2；超速：3；车辆失去控制：11；与对向摩托车相撞：12

下载: 导出CSV

表 4 参数设置

Table 4. Parameters setting

参数名称	值	参数名称	值
词向量维度	144	卷积层数/层	2
卷积核大小	3 × 3	下采样层/层	2
过滤器数量/个	32	Word2vec词向量维度	129
初始学习率	0.001	Softmax输出维度	13
批处理大小/个	500	数据迭代数/次	96
丢弃率/%	0.5

下载: 导出CSV

表 5 准确率、召回率以及准确率和召回率的调和平均值

Table 5. Tested results of precision，recall，and their F-score values %

类别	P	R	F
0	71.921	89.115	79.6001
1	68.714	81.181	74.4291
2	67.571	77.391	72.1484
3	70.333	75.077	72.6276
4	80.025	84.668	82.2811
5	52.354	65.422	58.1630
6	70.564	82.325	75.9921
7	78.558	84.456	81.4003
8	70.556	71.544	71.0466
9	38.248	52.555	44.2744
11	76.154	98.000	81.6496
12	75.440	85.470	80.1424

下载: 导出CSV

参考文献(12)

丁辉. 风险、风险源、危险源的关系探讨[J]. 安全,2019,40(4): 1-4.

DING Hui. Discussion on the relationship among risk,risk source and hazard[J]. Safety & Security, 2019, 40(4): 1-4.

LI J, WANG J, XU N, et al. Importance degree research of safety risk management processes of urban rail transit based on text mining method[J]. Information, 2018, 9(2): 9-26.

SHI D, GUAN J, ZURADA J, et al. A data-mining approach to identification of risk factors in safety management systems[J]. Journal of Management Information Systems, 2017, 34(4): 1054-1081. doi: 10.1080/07421222.2017.1394056

WILLIAMS T, BETAK J. A comparison of LSA and LDA for the analysis of railroad accident text[J]. Journal of Ubiquitous System & Pervasive Networks, 2019, 11(1): 11-15.

ANDRZEJCZAK C, KARWOWSKI W, MIKUSINSKI P. Application of diffusion maps to identify human factors of self-reported anomalies in aviation[J]. Work, 2012, 41(S1): 188-197.

TANGUY L, TULECHKI N, URIELI A, et al. Natural language processing for aviation safety reports:from classification to interactive analysis[J]. Computers in Industry, 2016, 78: 80-95. doi: 10.1016/j.compind.2015.09.005

ZHANG X, MAHADEVAN S. Ensemble machine learning models for aviation incident risk prediction[J]. Decision Support Systems, 2019, 116: 48-63. doi: 10.1016/j.dss.2018.10.009

AN N, XIAO Y B, YUAN J, et al. Extracting causal relations from the literature with word vector mapping[J]. Computers in Biology and Medicine, 2019, 115: 103524.1-103524.8. doi: 10.1016/j.compbiomed.2019.103524

ZENG D, LIU K, LAI S, et al. Relation classification via convolutional deep neural network[C]//Proceedings of the 25th International Conference on Computational Linguistics. New York: Curran Associates, 2014: 2335-2344.

袁飞,赵绪言,王一戈,等. 基于轻量级卷积神经网络的烟雾识别算法[J]. 西南交通大学学报,2020,55(5): 1111-1116,1132.

YUAN Fei, ZHAo Xuyan, WANG Yige, et al. Smoke recognition algorithm based on lightweight convolutional neural network[J]. Journal of Southwest Jiaotong University, 2020, 55(5): 1111-1116,1132.

裘江南. 汉语文本中突发事件因果关系抽取方法研究[D]. 大连: 大连理工大学, 2012.

HE B, GUAN Y, DAI R. Classifying medical relations in clinical text via convolutional neural networks[J]. Artificial Intelligence in Medicine, 2019, 93: 43-49. doi: 10.1016/j.artmed.2018.05.001

施引文献

附加材料(0)

访问统计