• ISSN 0258-2724
  • CN 51-1277/U
  • EI Compendex
  • Scopus 收录
  • 全国中文核心期刊
  • 中国科技论文统计源期刊
  • 中国科学引文数据库来源期刊

多尺度注意力学习的Faster R-CNN口罩人脸检测模型

李泽琛 李恒超 胡文帅 杨金玉 华泽玺

李泽琛, 李恒超, 胡文帅, 杨金玉, 华泽玺. 多尺度注意力学习的Faster R-CNN口罩人脸检测模型[J]. 西南交通大学学报, 2021, 56(5): 1002-1010. doi: 10.3969/j.issn.0258-2724.20210017
引用本文: 李泽琛, 李恒超, 胡文帅, 杨金玉, 华泽玺. 多尺度注意力学习的Faster R-CNN口罩人脸检测模型[J]. 西南交通大学学报, 2021, 56(5): 1002-1010. doi: 10.3969/j.issn.0258-2724.20210017
LI Zechen, LI Hengchao, HU Wenshuai, YANG Jinyu, HUA Zexi. Masked Face Detection Model Based on Multi-scale Attention-Driven Faster R-CNN[J]. Journal of Southwest Jiaotong University, 2021, 56(5): 1002-1010. doi: 10.3969/j.issn.0258-2724.20210017
Citation: LI Zechen, LI Hengchao, HU Wenshuai, YANG Jinyu, HUA Zexi. Masked Face Detection Model Based on Multi-scale Attention-Driven Faster R-CNN[J]. Journal of Southwest Jiaotong University, 2021, 56(5): 1002-1010. doi: 10.3969/j.issn.0258-2724.20210017

多尺度注意力学习的Faster R-CNN口罩人脸检测模型

doi: 10.3969/j.issn.0258-2724.20210017
基金项目: 国家自然科学基金(61871335);中央高校基本业务费专项资金(2682020XG02,2682020ZT35);国家重点研发计划(2020YFB1711902)
详细信息
    作者简介:

    李泽琛(1996—),男,博士研究生,研究方向为图像处理与模式识别,E-mail:Lizc@my.swjtu.edu.cn

    通讯作者:

    华泽玺(1968—),男,副教授,博士,研究方向为轨道交通智慧运维、传感器与智能检测、监测,E-mail:huazexi@163.com

  • 中图分类号: TP391.41;TP183

Masked Face Detection Model Based on Multi-scale Attention-Driven Faster R-CNN

  • 摘要: 针对在佩戴口罩等有遮挡条件下的人脸检测问题,提出了多尺度注意力学习的Faster R-CNN (MSAF R-CNN)人脸检测模型. 首先,为充分考虑人脸目标多尺度信息,相较于原始Faster R-CNN框架,引入Res2Net分组残差结构,获取更细粒度的特征表征;其次,基于空间-通道注意力结构改进的Res2Net模块,结合注意力机制自适应学习目标不同尺度特征;最后,为学习目标的全局信息并减轻过拟合现象,在模型顶端嵌入加权空间金字塔池化网络,采用由粗到细的方式进行特征尺度划分. 在AIZOO和FMDD两个人脸数据集上的实验结果表明:所提出MSAF R-CNN模型对佩戴口罩的人脸检测准确率分别达到90.37%和90.11%,验证了模型的可行性和有效性.

     

  • 图 1  Res2Net模块

    Figure 1.  Res2Net module

    图 2  SCA-Res2Net模块

    Figure 2.  Structure of SCA-Res2Net module

    图 3  WSPP-Net模块

    Figure 3.  Structure of WSPP-Net

    图 4  MSAF R-CNN模型

    BN —batch normalization

    Figure 4.  MSAF R-CNN model

    图 5  数据集部分图像

    Figure 5.  Partial images of datasets

    表  1  不同分组数实验结果

    Table  1.   Experimental results under different numbers of groups %

    数据集类别分组数
    246810
    AIZOOFace90.4390.3290.1189.9290.10
    Mask89.9590.3789.8690.2789.50
    mAP90.1990.3589.9990.1089.80
    FMDDFace86.2187.2786.1786.5086.17
    Mask89.9990.1190.0490.2189.99
    mAP88.1088.6988.1088.3588.08
    下载: 导出CSV

    表  2  不同压缩比实验结果

    Table  2.   Experimental results under different compression ratios %

    数据集类别压缩比
    1012141618
    AIZOOFace90.3190.3990.1290.3290.41
    Mask89.7990.0890.2090.3789.87
    mAP90.0590.2390.1690.3590.14
    FMDDFace86.9884.8986.2687.2786.30
    Mask89.9089.6890.2590.1189.86
    mAP88.4487.2988.2688.6988.08
    下载: 导出CSV

    表  3  WSPP-Net不同多尺度窗口大小实验结果

    Table  3.   Experimental results under different window sizes in WSPP-Net %

    数据集类别窗口大小
    S1S2S3S4
    AIZOOFace90.0890.3290.3290.38
    Mask90.3190.3790.1390.01
    mAP90.2090.3590.2290.19
    FMDDFace86.5187.2786.5686.45
    Mask89.7690.1189.6089.99
    mAP88.1488.6988.0888.22
    下载: 导出CSV

    表  4  不同检测方法的性能

    Table  4.   Performance of different methods %

    数据集类别模型 1模型 2模型 3模型 4MSAF R-CNN
    AIZOOFace87.3290.4289.9490.1990.32
    Mask78.1589.8489.7189.9990.37
    mAP82.7390.1389.8290.0990.35
    FMDDFace86.0186.4184.4485.0587.27
    Mask77.9590.0189.9490.1090.11
    mAP81.9888.2187.1987.5888.69
    下载: 导出CSV

    表  5  消融实验结果

    Table  5.   Ablation experimental results of feature removal and fusion %

    数据集类别模型 5模型 6模型 7MSAF R-CNN
    AIZOOFace90.4389.9290.4090.32
    Mask90.0590.0390.0090.37
    mAP90.2489.9790.2090.35
    FMDDFace85.1386.0186.2387.27
    Mask89.9390.0089.9890.11
    mAP87.5388.0188.1088.69
    下载: 导出CSV
  • VIOLA P, JONES M. Rapid object detection using a boosted cascade of simple features[C]//Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. Kauai: IEEE, 2001: I.511-I.518.
    LIENHART R, MAYDT J. An extended set of Haar-like features for rapid object detection[C]//Proceedings of International Conference on Image Processing. New York: IEEE, 2002: I.900-I.903.
    胡丽乔,仇润鹤. 一种自适应加权HOG特征的人脸识别算法[J]. 计算机工程与应用,2017,53(3): 164-168. doi: 10.3778/j.issn.1002-8331.1506-0183

    HU Liqiao, QIU Runhe. Face recognition based on adaptively weighted HOG[J]. Computer Engineering and Applications, 2017, 53(3): 164-168. doi: 10.3778/j.issn.1002-8331.1506-0183
    张路达,邓超. 多尺度融合的YOLOv3人群口罩佩戴检测方法[J]. 计算机工程与应用,2021,57(16): 283-290.

    ZHANG Luda, DENG Chao. Multi-scale fusion of YOLOv3 crowd mask wearing detection method[J]. Computer Engineering and Applications, 2021, 57(16): 283-290.
    魏丽,王洁,姜昕言,等. 遮挡条件下的人脸检测与遮挡物属性判识[J]. 计算机仿真,2020,37(9): 441-445,450. doi: 10.3969/j.issn.1006-9348.2020.09.093

    WEI Li, WANG Jie, JIANG Xinyan, et al. Face detection and obstacle attribute identification under occlusion[J]. Computer Simulation, 2020, 37(9): 441-445,450. doi: 10.3969/j.issn.1006-9348.2020.09.093
    薛均晓,程君进,张其斌,等. 改进轻量级卷积神经网络的复杂场景口罩佩戴检测方法[J]. 计算机辅助设计与图形学学报,2021,33(7): 1045-1054.

    XUE Junxiao, CHENG Junjin, ZHANG Qibin, et al. Improved efficient convolutional neural network for complex scene mask-wearing detection[J]. Journal of Computer-Aided Design & Computer Graphics, 2021, 33(7): 1045-1054.
    LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot MultiBox detector[M]//Computer Vision – ECCV 2016. Cham: Springer International Publishing, 2016: 21-37.
    迟万达,王士奇,张潇,等. 基于轻量化SSD的人脸检测模型设计[J]. 计算机与网络,2021,47(5): 69-73. doi: 10.3969/j.issn.1008-1739.2021.05.055

    CHI Wanda, WANG Shiqi, ZHANG Xiao, et al. Design on face detection model based on lightweight SSD[J]. Computer & Network, 2021, 47(5): 69-73. doi: 10.3969/j.issn.1008-1739.2021.05.055
    GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 580-587.
    GIRSHICK R. Fast R-CNN[C]//2015 IEEE International Conference on Computer Vision (ICCV). Santiago: IEEE, 2015: 1440-1448.
    REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN:towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. doi: 10.1109/TPAMI.2016.2577031
    GAO S H, CHENG M M, ZHAO K, et al. Res2Net:a new multi-scale backbone architecture[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(2): 652-662. doi: 10.1109/TPAMI.2019.2938758
    BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate[EB/OL]. (2014-09-01)[2020-12-20]. https://www.researchgate.net/publication/265252627_Neural_Machine_Translation_by_Jointly_Learning_to_Align_and_Translate.
    ZHU Y S, ZHAO C Y, GUO H Y, et al. Attention coupleNet:fully convolutional attention coupling network for object detection[J]. IEEE Transactions on Image Processing, 2019, 28(1): 113-126. doi: 10.1109/TIP.2018.2865280
    ZHANG J F, NIU L, ZHANG L Q. Person re-identification with reinforced attribute attention selection[J]. IEEE Transactions on Image Processing, 2021, 30: 603-616. doi: 10.1109/TIP.2020.3036762
    HE L, CHAN J C W, WANG Z M. Automatic depression recognition using CNN with attention mechanism from videos[J]. Neurocomputing, 2021, 422: 165-175. doi: 10.1016/j.neucom.2020.10.015
    HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 770-778.
    MUDUMBI T, BIAN N Z, ZHANG Y Y, et al. An approach combined the faster RCNN and mobilenet for logo detection[J]. Journal of Physics:Conference Series, 2019, 1284: 012072.1-012072.8.
    SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[C]//3rd International Conference on Learning Representations. San Diego: [s.n.], 2015: 1-14.
    SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Boston: IEEE, 2015: 1-9.
    HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 7132-7141.
    WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module[M]//Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018: 3-19.
    XI O Y, KANG G, PAN Z. Spatial pyramid pooling mechanism in 3D convolutional network for sentence-level classification[J]. IEEE/ACM Transactions on Audio,Speech,and Language Processing, 2018, 26(11): 2167-2179. doi: 10.1109/TASLP.2018.2852502
    YANG R, ZHANG Y, ZHAO P F, et al. MSPPF-nets: a deep learning architecture for remote sensing image classification[C]//IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium. Yokohama: IEEE, 2019: 3045-3048.
    WANG H J, SHI Y Y, YUE Y J, et al. Study on freshwater fish image recognition integrating SPP and DenseNet network[C]//2020 IEEE International Conference on Mechatronics and Automation (ICMA). Beijing: IEEE, 2020: 564-569.
    WANG T, YUAN L, ZHANG X, et al. Distilling object detectors with fine-grained feature imitation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 4928-4937.
    YANG S, LUO P, LOY C C, et al. WIDER FACE: a face detection benchmark[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Las Vegas: IEEE, 2016: 5525-5533.
    GE S M, LI J, YE Q T, et al. Detecting masked faces in the wild with LLE-CNNs[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu: IEEE, 2017: 426-434.
    LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//2017 IEEE International Conference on Computer Vision (ICCV). Venice: IEEE, 2017: 2999-3007.
  • 加载中
图(5) / 表(5)
计量
  • 文章访问数:  617
  • HTML全文浏览量:  305
  • PDF下载量:  58
  • 被引次数: 0
出版历程
  • 收稿日期:  2021-01-11
  • 修回日期:  2021-07-07
  • 网络出版日期:  2021-07-16
  • 刊出日期:  2021-10-15

目录

    /

    返回文章
    返回