Improvement of Highway Traffic Risk Prediction Method Based on Traffic Accident Text Mining
-
摘要:
为有效解决高速公路巡查里程长、管控难度大等问题,对现有双向长短记忆网络(BiLSTM)文本分类模型及卷积神经网络(CNN)风险预测预测模型进行适用性改进,分析挖掘历史道路交通事故文本数据,引入道路区段划分方法以准确预测高速公路行车风险分布,实现高速公路行车安全科学管控. 首先,基于自注意力机制改进的双向长短期记忆网络(BiLSTM-AT)对交通事故文本进行分类,得到每条事故对应的事故风险等级;其次,在ArcGIS中区段划分高速公路,统计每个区段内的行车风险等级并进行核密度分析,将文本分类结果可视化,展示不同区域的风险大小;最后,基于长短记忆网络(LSTM)的卷积神经网络(CNN-LSTM)对分类后的风险等级进行时间序列预测,得到未来高速公路行车风险的空间分布,得到并绘制高速公路行车风险等级云图. 研究结果表明:在事故文本分类方面,BiLSTM-AT模型的分类准确率达到95.03%,较BiLSTM和GRU分别提高0.91%和0.67%;在风险预测方面时,CNN-LSTM的平均相对误差和均方根误差分别为0.04和0.07,相较于次优的LSTM模型分别下降了9.05%和6.84%. 本文提出从事故文本分类、区段划分、行车风险预测到结果可视化紧密相接的方法,可有效对交通事故文本中的行车风险信息进行提取及分析,为优化高速公路巡查路线及重点区段的交通管控提供参考.
Abstract:To effectively solve the problems of long highway inspection mileage and control difficulty, the applicability of the existing bidirectional long and short-term memory network (BiLSTM) text classification model and convolutional neural network (CNN) risk prediction model was improved, and the historical road traffic accident text data were analyzed and mined. The road segmentation method was introduced to accurately predict the distribution of highway driving risks and realize the scientific control of highway driving safety. Firstly, the text of traffic accidents was classified by the improved BiLSTM based on a self-attention mechanism (BiLSTM-AT), and the corresponding accident risk level of each accident was obtained. Second, the highway was divided into segments in ArcGIS, and the driving risk level within each segment was counted; kernel density analysis was performed to visualize the text classification results and show the risk level in different areas. Finally, the CNN based on LSTM (CNN-LSTM) was used to conduct time series prediction for the classified risk levels, obtaining the spatial distribution of future highway driving risks and drawing the cloud map of highway driving risk levels. The results show that the accuracy of the BiLSTM-AT model reaches 95.03% in terms of accident text classification, which is 0.91% and 0.67% higher than that of the BiLSTM and gate recurrent unit (GRU), respectively; the average relative error and root mean square error of the CNN-LSTM are 0.04 and 0.07, respectively, in terms of risk prediction, which are lower than that of the suboptimal LSTM model by 9.05% and 6.84%, respectively. The proposed method that closely connects accident text classification, segment division, driving risk prediction, and result visualization can effectively extract and analyze the driving risk information in the traffic accident text and provide a reference for optimizing the highway inspection routes and the traffic control of key segments.
-
Key words:
- traffic engineering /
- BiLSTM-AT /
- CNN-LSTM /
- traffic accident text /
- highway driving risk
-
表 1 道路交通事故分类依据
Table 1. Classification basis of road traffic accidents
类别 分类依据 轻微 一次造成轻伤 1 至 2 人的事故 一般 一次造成重伤 1 至 2 人,或轻伤 3 人以上的事故 重大 一次造成死亡 1 至 2 人,或重伤 3 人以上 10 人以下 特大 一次造成死亡 3 人以上,或重伤 11 人以上,或死亡 1 人,同时重伤 8 人以上,或死亡 2 人,同时重伤 5 人以上的事故 表 2 重新划分后的事故分类依据
Table 2. Classification basis of accidents after reclassification
类别 事故描述 标注 等级 轻微 无人员伤亡,或造成受伤 1 人 A 1 一般 造成受伤 2 至 4 人 B 2 重大 造成死亡 1 至 2 人,或造成受伤 5 至 10 人 C 3 特大 造成死亡 3 人及以上;造成死亡 1 人,受伤 5 人及以上;造成死亡 2 人,受伤 3 人及以上;
造成受伤 11 人及以上D 4 表 3 部分道路交通事故分类示例
Table 3. Some examples of classification of road traffic accidents
事故文本 类别 标注 等级 2021 年 11 月 19 日 19 时许,马 × × 驾驶云 DMX5 × × 号普通小型客车行至汕昆高速公路 K1779 + 00 处与从左侧向右侧横穿公路的康 × × 相撞. 造成康 × × 1人受伤,机动车受损的伤人交通事故 轻微 A 1 2021 年 03 月 08 日 09 时 00 分许,行人李 × × 和陈 × × 在汕昆高速 K1823 + 900 处与 × × 驾驶的云 A616 × × 号普通客车车头相撞,造成李 × × 、陈 × × 2 人受伤,车辆受损的伤人道路交通事故 一般 B 2 2021 年 01 月 14 日,付 × × 驾驶云 AN33 × × 号夏利牌轿车行驶至汕昆高速公路 K1831 + 600 处时车辆与周 × 刮撞后和道路中央隔离栏相撞,造成周 × 1 人死亡,车辆受损的死亡道路交通事故 重大 C 3 2021 年 09 月 26 日 21 时 50 分许,李 × × 驾驶云 G567 × × 号货车行驶至汕昆高速公路K1821 + 850 处时,与祁光平驾驶的云 A230 × × 号小型普通客车追尾相撞,造成乘车人李 × × 、段 × × 、李 × × 3 人死亡,车辆受损的死亡道路交通事故 特大 D 4 表 4 文本清洗结果
Table 4. Text cleaning results
步骤 数据清洗结果 中文事故文本 2020 年 9 月 17 日 17 时 40 分,王 × × 驾驶云 AG66 × × 号货车行驶至汕昆高速公路 K1805 + 200 处发生碰撞,造成道路设施损坏,无人员伤亡的交通事故 分词后 2020 年 9 月 17 日 17 时 40 分, 王 × × 驾驶 云 AG66 × × 号 货车 行驶 至 汕昆 高速公路 K1805 + 200 处发生碰撞,造成道路设施损坏, 无人员伤亡的交通事故 去停用词后 王 × × 云 号 货车 行驶 汕昆 高速公路 处 发生 碰撞 造成 道路设施 损坏 人员伤亡 交通事故 表 5 不同模型预测结果的评价指标
Table 5. Evaluation metrics for prediction results of different models
模型 eMAE eRMSE CNN-LSTM 0.0382 0.0736 LSTM 0.0420 0.0790 GRU 0.0453 0.0846 CNN 0.0682 0.1082 注:加粗表示该模型的评估指标最优. 表 6 ArcGIS行车风险等级云图对比表
Table 6. ArcGIS driving risk level cloud map comparison
项目 时刻与模型
选择ArcGIS行车风险等级云图 其他模型较CNN模型
预测效果准确提升度eMAE/% eRMSE/% 实际风险分布 T0 时刻 
T1 时刻 
T1 时刻不同模型预测结果 T1 时刻 CNN-LSTM
(效果最优)
43.99 31.98 T1 时刻 LSTM 
38.42 26.99 T1 时刻 GRU 
33.72 21.81 T1 时刻 CNN
(效果最差)
-
[1] 何杰, 叶云涛, 徐扬, 等. 基于多模态参数的高速公路驾驶人压力负荷检测方法[J]. 西南交通大学学报, 2025, 60(5): 1229-1239.HE Jie, YE Yuntao, XU Yang, et al. Method for stress detection of freeway drivers based on multimodal parameters[J]. Journal of Southwest Jiaotong University, 2025, 60(5): 1229-1239. [2] 胡立伟, 吕一帆, 赵雪亭等. 基于数据驱动的交通事故伤害程度影响因素及其耦合关系研究[J]. 交通运输系统工程与信息, 2022, 22(05): 117-124, 134HU Liwei, LV Yifan, ZHAO Xueting, et al. Research on the influencing factors and coupling relationship of traffic accident injury severity based on data-driven approach[J]. Journal of Transportation Systems Engineering and Information Technology, 2022, 22(05): 117-124, 134. [3] 牛世峰, 董景钊, 常东风, 等. 考虑空间稳定性的公路单车事故严重程度影响因素分析[J/OL]. 西南交通大学学报, 1-15[2025-05-03]. http://kns.cnki.net/kcms/detail/51.1277.U.20250324.1433.002.html. [4] PUTRA A D, GIRSANG A S. Analysis of named-entity effect on text classification of traffic accident data using machine learning[J]. Indonesian Journal of Electrical Engineering and Computer Science, 2022, 25(3): 1672-1678. doi: 10.11591/ijeecs.v25.i3.pp1672-1678 [5] RIVERA G, FLORENCIA R, GARCÍA V, et al. News classification for identifying traffic incident points in a Spanish-speaking country: a real-world case study of class imbalance learning[J]. Applied Sciences, 2020, 10(18): 6253. doi: 10.3390/app10186253 [6] 范维克, 张绍阳, 陈博远, 等. 交通信息标准条款BLSTM和CNN链式模型分类方法[J]. 江苏大学学报(自然科学版), 2020, 41(02): 143-148. doi: 10.3969/j.issn.1671-7775.2020.02.004FAN Weike, ZHANG Shaoyang, CHEN Boyuan, et al. Classification methods of traffic standard terms based on BLSTM and CNN chain model[J]. Journal of Jiangsu University (Natural Science Edition), 2020, 41(02): 143-148. doi: 10.3969/j.issn.1671-7775.2020.02.004 [7] YUAN S, WANG Q. Imbalanced traffic accident text classification based on Bert-RCNN[C]//Journal of Physics: Conference Series. [S.l.]: IOP, 2022: 012003. [8] 李昀轩, 李萌, 陆建, 等. 基于多任务迁移学习的交通警情信息自动处理方法[J]. 中国公路学报, 2022, 35(9): 1-12.LI Yunxuan, LI Meng, LU Jian, et al. An auto-processing method of traffic safety information based on a multi-task transfer learning algorithm[J]. China Journal of Highway and Transport, 2022, 35(9): 1-12. [9] HOSSAIN M, MUROMACHI Y. A Bayesian network based framework for real-time crash prediction on the basic freeway segments of urban expressways[J]. Accident Analysis & Prevention, 2012, 45: 373-381. [10] YU R, ABDEL-ATY M. Utilizing support vector machine in real-time crash risk evaluation[J]. Accident Analysis & Prevention, 2013, 51: 252-259. [11] LIN L, WANG Q, SADEK A W. A novel variable selection method based on frequent pattern tree for real-time traffic accident risk prediction[J]. Transportation Research Part C: Emerging Technologies, 2015, 55: 444-459. doi: 10.1016/j.trc.2015.03.015 [12] SAMEEN M I, PRADHAN B. Severity prediction of traffic accidents with recurrent neural networks[J]. Applied Sciences, 2017, 7(6): 476-482. doi: 10.3390/app7060476 [13] YUAN Z, ZHOU X, YANG T. Hetero-convlstm: a deep learning approach to traffic accident prediction on heterogeneous spatio-temporal data[C]//Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. London: Association for Computing Machinery, 2018: 984-992. [14] ZHOU Z, WANG Y, XIE X, et al. RiskOracle: A minute-level citywide traffic accident forecasting framework[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAI, 2020, 34(1): 1258-1265. [15] 熊晓夏, 刘擎超, 沈钰杰等. 基于LSTM-BF的高速公路交通事故风险模型[J]. 中国安全科学学报, 2022, 32(5): 170-176.XIONG Xiaoxia, LIU Qingchao, SHEN Yujie, et al. Study on risk model of highway traffic accidents based on LSTM-BF[J]. China Safety Science Journal, 2022, 32(5): 170-176. [16] 袁振洲, 胡嫣然, 杨洋. 考虑多维动态特征交互的高速公路实时事故风险建模[J]. 交通运输系统工程与信息, 2022, 22(3): 215-223.YUAN Zhenzhou, HU Yanran, YANG Yang. Modeling towards freeway real-time traffic crash prediction considering multi-dimensional dynamic feature interactions[J]. Journal of Transportation Systems Engineering and Information Technology, 2022, 22(3): 215-223. [17] KAFFASH CHARANDABI N, GHOLAMI A, ABDOLLAHZADEH BINA A. Road accident risk prediction using generalized regression neural network optimized with self-organizing map[J]. Neural Computing and Applications, 2022, 34(11): 8511-8524. doi: 10.1007/s00521-021-06549-8 [18] 王贝贝, 万怀宇, 郭晟楠等. 融合局部和全局时空特征的交通事故风险预测[J]. 计算机科学与探索, 2021, 15(9): 1694-1702.WANG Beibei, WAN Huaiyu, GUO Shengnan, et al. Local and global spatial-temporal networks for traffic accident risk forecasting[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(9): 1694-1702. [19] LERIAN J C, CHENAYAN G. The implementation of multi label K-nearest neighbor algorithm to classifying essay answers[J]. Journal of Information System, Technology and Engineering, 2023, 1(3): 89-94. doi: 10.61487/jiste.v1i3.38 [20] BANSAL M, GOYAL A, CHOUDHARY A. A comparative analysis of K-nearest neighbor, genetic, support vector machine, decision tree, and long short term memory algorithms in machine learning[J]. Decision Analytics Journal, 2022, 3: 100071. doi: 10.1016/j.dajour.2022.100071 [21] DEVLIN J, CHANG M W, Lee K, et al. Bert: pre-training of deep bidirectional transformers for language understanding[C]//2019 Conference of the North American Chapter. Minnesota: Association for Computational Lin uistics, 2019: 4171-4186. [22] LIU Y, OTT M, GOYAL N, et al. Roberta: a robustly optimized bert pretraining approach[J]. ArXiv, 2019, 11692: 1-13. [23] 尹何举, 昝红英, 陈俊怡, 等. 交通事故的自动判案研究[J]. 中文信息学报, 2019, 33(3): 136-144.YIN Heju, ZAN Hongying, CHEN Junyi, et al. Study on automatic judgment of traffic accidents[J]. Journal of Chinese Information Processing, 2019, 33(3): 136-144. [24] 张文峰, 奚雪峰, 崔志明, 等. 多标签文本分类研究回顾与展望[J]. 计算机工程与应用, 2023, 59(18): 28-48. doi: 10.3778/j.issn.1002-8331.2210-0446ZHANG Wenfeng, XI Xuefeng, CUI Zhiming, et al. Review and prospect of multi-Label text classification research[J]. Computer Engineering and Applications, 2023, 59(18): 28-48. doi: 10.3778/j.issn.1002-8331.2210-0446 [25] 中华人民共和国国务院. 道路交通事故处理办法(国务院令第89号)[EB/OL]. (1992-01-01). https://www.mps.gov.cn/n2255079/n2256030/n2256036/c3946102/content.html. 2025-05-03. [26] WEI J, ZOU K. EDA: easy data augmentation techniques for boosting performance on text classification tasks[C]//Proceedings of the 2019 Conference on EMNLP-IJCNLP. HongKong: Association for Computational Linguistics, 2019, 11196: 1-15. -
下载: