• ISSN 0258-2724
  • CN 51-1277/U
  • EI Compendex
  • Scopus 收录
  • 全国中文核心期刊
  • 中国科技论文统计源期刊
  • 中国科学引文数据库来源期刊

基于多尺度感知的密集人群计数网络

李恒超 刘香莲 刘鹏 冯斌

李恒超, 刘香莲, 刘鹏, 冯斌. 基于多尺度感知的密集人群计数网络[J]. 西南交通大学学报, 2024, 59(5): 1176-1183, 1214. doi: 10.3969/j.issn.0258-2724.20220823
引用本文: 李恒超, 刘香莲, 刘鹏, 冯斌. 基于多尺度感知的密集人群计数网络[J]. 西南交通大学学报, 2024, 59(5): 1176-1183, 1214. doi: 10.3969/j.issn.0258-2724.20220823
LI Hengchao, LIU Xianglian, LIU Peng, FENG Bin. Dense Crowd Counting Network Based on Multi-scale Perception[J]. Journal of Southwest Jiaotong University, 2024, 59(5): 1176-1183, 1214. doi: 10.3969/j.issn.0258-2724.20220823
Citation: LI Hengchao, LIU Xianglian, LIU Peng, FENG Bin. Dense Crowd Counting Network Based on Multi-scale Perception[J]. Journal of Southwest Jiaotong University, 2024, 59(5): 1176-1183, 1214. doi: 10.3969/j.issn.0258-2724.20220823

基于多尺度感知的密集人群计数网络

doi: 10.3969/j.issn.0258-2724.20220823
基金项目: 国家自然科学基金项目(62271418);四川省自然科学基金项目(23NSFSC0058)
详细信息
    作者简介:

    李恒超(1978—),男,教授,博士,研究方向为智能遥感图像处理,E-mail:hcli@home.swjtu.edu.cn

    通讯作者:

    冯斌(1979—),男,讲师,硕士,研究方向为体育场景人群计数,E-mail:feng197947@163.com

  • 中图分类号: TP391.41

Dense Crowd Counting Network Based on Multi-scale Perception

  • 摘要:

    针对密集人群场景存在的目标尺度多样、人群大尺度变化等问题,提出一种基于多尺度感知的密集人群计数网络. 首先,考虑到小尺度目标在图像中占比较大,以VGG-16 (visual geometry group 2016)网络为基础,引入空洞卷积模块,以挖掘图像细节信息;其次,为充分利用目标多尺度信息,构建新的上下文感知模块,以提取不同尺度之间的对比特征;最后,考虑到目标尺度连续变化的特点,设计多尺度特征聚合模块,提高密集尺度采样范围与多尺度信息交互,从而提升网络性能. 实验结果显示:在ShangHai Tech (Part_A/Part_B)和UCF_CC_50数据集上,本文方法的平均绝对误差(mean absolute error,MAE)分别为62.5、6.9、156.5,均方根误差(root mean square error,RMSE)分别为95.7、11.0、223.3;相较于最优对比方法,在UCF_QNRF数据集上的MAE和RMSE分别降低1.1%和4.3%,在NWPU数据集上分别降低8.7%和13.9%.

     

  • 图 1  基于多尺度感知的密集人群计数网络结构

    Figure 1.  Structure of dense crowd counting network based on multi-scale perception

    图 2  特征增强块结构

    Figure 2.  Structure of feature enhancement block

    图 3  部分可视化结果

    Figure 3.  Partial visualization results

    表  1  不同方法在Shanghai Tech、UCF_CC_50、UCF_QNRF、NWPU数据集上的对比结果

    Table  1.   Comparison results of different methods on Shanghai Tech, UCF_CC_50, UCF_QNRF, and NWPU datasets

    模型Shanghai Tech Part_AShanghai Tech Part_BUCF_ CC_50UCF_ QNRFNWPU
    MAERMSEMAERMSEMAERMSEMAERMSEMAERMSE
    MCNN[7]110.2173.226.441.3377.6509.1277.0426.0218.5700.6
    CSRNet[10]68.2115.010.616.0266.1397.5120.3208.5104.8433.4
    PDD-CNN[29]64.799.18.814.3205.4311.7115.3190.2
    TEDNet[18]64.2109.18.212.8249.4354.5113.0188.0
    KDMG[30]63.899.27.812.799.5173.0100.5415.5
    BL[31]62.8101.87.712.7229.3308.288.7154.893.6470.3
    CAN[20]62.3100.07.812.2212.2243.7107.0183.093.5489.9
    MCANet[32]60.1100.26.811.0181.3258.6100.8185.9
    SC2Net[33]58.997.76.911.4209.4286.398.5174.589.7348.9
    MSPNet62.595.76.911.0156.5223.387.7148.281.9300.3
    下载: 导出CSV

    表  2  CAM结构的消融实验结果

    Table  2.   Ablation experiments of CAM structure

    方法 MAE RMSE
    本文+PPM 63.6 105.4
    本文+CAM 62.5 95.7
    下载: 导出CSV

    表  3  模块结构的消融实验结果

    Table  3.   Ablation experiments of different module structures

    方法 MAE RMSE
    CAM 68.2 118.8
    DCM 66.2 113.0
    DCM+CAM 64.9 109.8
    CAM+MSAM 65.5 111.4
    DCM+MSAM 64.0 111.5
    DCM+CAM+MSAM 62.5 95.7
    下载: 导出CSV

    表  4  FEB层选择消融实验结果

    Table  4.   Ablation experiments of number selection for FEB

    FEB 层数/层 MAE RMSE
    0 64.9 109.8
    2 63.5 103.4
    4 62.5 95.7
    6 67.1 115.8
    下载: 导出CSV
  • [1] FAN Z Z, ZHANG H, ZHANG Z, et al. A survey of crowd counting and density estimation based on convolutional neural network[J]. Neurocomputing, 2022, 472: 224-251. doi: 10.1016/j.neucom.2021.02.103
    [2] 余鹰,朱慧琳,钱进,等. 基于深度学习的人群计数研究综述[J]. 计算机研究与发展,2021,58(12): 2724-2747. doi: 10.7544/issn1000-1239.2021.20200699

    YU Ying, ZHU Huilin, QIAN Jin, et al. Survey on deep learning based crowd counting[J]. Journal of Computer Research and Development, 2021, 58(12): 2724-2747. doi: 10.7544/issn1000-1239.2021.20200699
    [3] TOPKAYA I S, ERDOGAN H, PORIKLI F. Counting people by clustering person detector outputs[C]//2014 11th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). Seoul: IEEE, 2014: 313-318.
    [4] RYAN D, DENMAN S, SRIDHARAN S, et al. An evaluation of crowd counting methods, features and regression models[J]. Computer Vision and Image Understanding, 2015, 130: 1-17. doi: 10.1016/j.cviu.2014.07.008
    [5] LEMPITSKY V, ZISSERMAN A. Learning to count objects in images[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems. New York: ACM, 2010: 1324-1332.
    [6] WANG C, ZHANG H, YANG L, et al. Deep people counting in extremely dense crowds[C]//Proceedings of the 23rd ACM international conference on Multimedia. Brisbane: ACM, 2015: 1299-1302.
    [7] ZHANG Y Y, ZHOU D S, CHEN S Q, et al. Single-image crowd counting via multi-column convolutional neural network[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 589-597.
    [8] SINDAGI V A, PATEL V M. Generating high-quality crowd density maps using contextual pyramid CNNs[C]//2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017: 1879-1888.
    [9] CAO X K, WANG Z P, ZHAO Y Y, et al. Scale aggregation network for accurate and efficient crowd counting[C]//European Conference on Computer Vision. Cham: Springer, 2018: 757-773.
    [10] LI Y H, ZHANG X F, CHEN D M. CSRNet: dilated convolutional neural networks for understanding the highly congested scenes[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 1091-1100.
    [11] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//3rd International Conference on Learning Representations. San Diego: [s.n.], 2015: 1-14.
    [12] YU F, KOLTUN V. Multi-scale context aggregation by dilated convolutions[C]//Proceedings of the 4th International Conference on Learning Representations. Puerto Rico: ICLR, 2016: 1-13.
    [13] 田晟,张剑锋,张裕天,等. 基于扩张卷积金字塔网络的车道线检测算法[J]. 西南交通大学学报,2020,55(2): 386-392,416. doi: 10.3969/j.issn.0258-2724.20181026

    TIAN Sheng, ZHANG Jianfeng, ZHANG Yutian, et al. Lane detection algorithm based on dilated convolution pyramid network[J]. Journal of Southwest Jiaotong University, 2020, 55(2): 386-392,416. doi: 10.3969/j.issn.0258-2724.20181026
    [14] 左静,巴玉林. 基于多尺度融合的深度人群计数算法[J]. 激光与光电子学进展,2020,57(24): 315-323.

    ZUO Jing, BA Yulin. Population-depth counting algorithm based on multiscale fusion[J]. Laser & Optoelectronics Progress, 2020, 57(24): 315-323.
    [15] CHEN X Y, BIN Y R, SANG N, et al. Scale pyramid network for crowd counting[C]//2019 IEEE Winter Conference on Applications of Computer Vision (WACV). Waikoloa: IEEE, 2019: 1941-1950.
    [16] SINDAGI V, PATEL V. Multi-level bottom-top and top-bottom feature fusion for crowd counting[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul: IEEE, 2019: 1002-1012.
    [17] ZHOU Y, YANG J X, LI H R, et al. Adversarial learning for multiscale crowd counting under complex scenes[J]. IEEE Transactions on Cybernetics, 2021, 51(11): 5423-5432. doi: 10.1109/TCYB.2019.2956091
    [18] JIANG X L, XIAO Z H, ZHANG B C, et al. Crowd counting and density estimation by trellis encoder-decoder networks[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019: 6126-6135.
    [19] TIAN Y K, LEI Y M, ZHANG J P, et al. PaDNet: pan-density crowd counting[J]. IEEE Transactions on Image Processing, 2020, 29: 2714-2727. doi: 10.1109/TIP.2019.2952083
    [20] LIU W Z, SALZMANN M, FUA P. Context-aware crowd counting[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Long Beach: IEEE, 2019: 5094-5103.
    [21] LI H, KONG W H, ZHANG S H. Effective crowd counting using multi-resolution context and image quality assessment-guided training[J]. Computer Vision and Image Understanding,2020,201:103065.1-103065.10.
    [22] IDREES H, SALEEMI I, SEIBERT C, et al. Multi-source multi-scale counting in extremely dense crowd images[C]//2013 IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 2547-2554.
    [23] IDREES H, TAYYAB M, ATHREY K, et al. Composition loss for counting, density map estimation and localization in dense crowds[C]//European Conference on Computer Vision. Cham: Springer, 2018: 544-559.
    [24] WANG Q, GAO J Y, LIN W, et al. NWPU-crowd: a large-scale benchmark for crowd counting and localization[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(6): 2141-2149. doi: 10.1109/TPAMI.2020.3013269
    [25] ZHAO H S, SHI J P, QI X J, et al. Pyramid scene parsing network[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 6230-6239.
    [26] ZHANG Z L, ZHANG X Y, PENG C, et al. ExFuse: enhancing feature fusion for semantic segmentation[C]//European Conference on Computer Vision. Cham: Springer, 2018: 273-288.
    [27] 王银,王立德,邱霁. 基于DenseNet结构的轨道暗光环境实时增强算法[J]. 西南交通大学学报,2022,57(6): 1349-1357. doi: 10.3969/j.issn.0258-2724.20210199

    WANG Yin, WANG Lide, QIU Ji. Real-time enhancement algorithm based on DenseNet structure for railroad low-light environment[J]. Journal of Southwest Jiaotong University, 2022, 57(6): 1349-1357. doi: 10.3969/j.issn.0258-2724.20210199
    [28] DAI F, LIU H, MA Y K, et al. Dense scale network for crowd counting[C]//Proceedings of the 2021 International Conference on Multimedia Retrieval. Taipei: ACM, 2021: 64-72.
    [29] WANG W X, LIU Q L, WANG W. Pyramid-dilated deep convolutional neural network for crowd counting[J]. Applied Intelligence, 2022, 52(2): 1825-1837. doi: 10.1007/s10489-021-02537-6
    [30] WAN J, WANG Q Z, CHAN A B. Kernel-based density map generation for dense object counting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44(3): 1357-1370. doi: 10.1109/TPAMI.2020.3022878
    [31] MA Z H, WEI X, HONG X P, et al. Bayesian loss for crowd count estimation with point supervision[C]//2019 IEEE/CVF International Conference on Computer Vision (ICCV). Seoul: IEEE, 2019: 6141-6150.
    [32] WANG X, LV R R, ZHAO Y, et al. Multi-scale context aggregation network with attention-guided for crowd counting[C]//2020 15th IEEE International Conference on Signal Processing (ICSP). Beijing: IEEE, 2020: 240-245.
    [33] LIANG L J, ZHAO H L, ZHOU F B, et al. SC2Net: scale-aware crowd counting network with pyramid dilated convolution[J]. Applied Intelligence, 2023, 53(5): 5146-5159.
  • 加载中
图(3) / 表(4)
计量
  • 文章访问数:  260
  • HTML全文浏览量:  82
  • PDF下载量:  59
  • 被引次数: 0
出版历程
  • 收稿日期:  2022-11-29
  • 修回日期:  2023-03-08
  • 网络出版日期:  2024-07-04
  • 刊出日期:  2023-03-17

目录

    /

    返回文章
    返回