时空压缩激励残差乘法网络的视频动作识别

罗会兰; 童康

doi:10.11959/j.issn.1000-436x.2019194

您当前的位置：

首页 >

文章列表页 >

时空压缩激励残差乘法网络的视频动作识别

学术通信 | 更新时间：2024-06-05

- 时空压缩激励残差乘法网络的视频动作识别
- Spatiotemporal squeeze-and-excitation residual multiplier network for video action recognition
- 通信学报 2019年40卷第10期页码：189-198
- 作者机构：
  
  江西理工大学信息工程学院，江西赣州 341000
- 作者简介：
  
  [ "罗会兰（1974- ），女，江西上高人，博士，江西理工大学教授，主要研究方向为计算机视觉、模式识别。" ]
  [ "童康（1992- ），男，江苏南京人，江西理工大学硕士生，主要研究方向为计算机视觉、视频动作识别。" ]
- 基金信息：
  
  国家自然科学基金资助项目(61862031);江西省自然科学基金资助项目(20171BAB202014);江西省赣州市“科技创新人才计划”基金资助项目
- DOI：10.11959/j.issn.1000-436x.2019194
  中图分类号： TP391
- 网络出版日期：2019-10，
  
  纸质出版日期：2019-10-25
- 稿件说明：
移动端阅览
罗会兰, 童康. 时空压缩激励残差乘法网络的视频动作识别[J]. 通信学报, 2019,40(10):189-198.

Huilan LUO, Kang TONG. Spatiotemporal squeeze-and-excitation residual multiplier network for video action recognition[J]. Journal on communications, 2019, 40(10): 189-198.
罗会兰, 童康. 时空压缩激励残差乘法网络的视频动作识别[J]. 通信学报, 2019,40(10):189-198. DOI： 10.11959/j.issn.1000-436x.2019194.

Huilan LUO, Kang TONG. Spatiotemporal squeeze-and-excitation residual multiplier network for video action recognition[J]. Journal on communications, 2019, 40(10): 189-198. DOI： 10.11959/j.issn.1000-436x.2019194.

摘要

针对双流网络结构中浅层网络和一般深度模型学习空间信息和时间信息的不足，提出将压缩激励残差网络用于空间流和时间流的动作识别，同时将恒等映射核作为时间滤波器注入网络中捕获长期时间依赖性。为了进一步加强压缩激励残差网络的空间信息和时间信息之间的交互，采用时空特征相乘融合，并研究空间流和时间流乘法融合方式、次数以及位置对识别性能的影响。鉴于单个模型获得性能的局限性，提出了3种不同的策略生成多个模型，并使用直接平均与加权平均集成以得到最终识别结果。HMDB51和UCF101数据集上的实验结果表明，所提时空压缩激励残差乘法网络能够有效提升动作识别性能。

Abstract

Aiming at the shortcomings of shallow networks and general deep models in two-stream network structure

which could not effectively learn spatial and temporal information

a squeeze-and-excitation residual network was proposed for action recognition with a spatial stream and a temporal stream.Meanwhile

the long-term temporal dependence was captured by injecting the identity mapping kernel into the network as a temporal filter.Spatiotemporal feature multiplication fusion was used to further enhance the interaction between spatial information and temporal information of squeeze-and-excitation residual networks.Simultaneously

the influence of spatial-temporal stream multiplication fusion methods

times and locations on the performance of action recognition was studied.Given the limitations of performance achieved by a single model

three different strategies were proposed to generate multiple models

and the final recognition result was obtained by integrating these models through averaging and weighted averaging.The experimental results on the HMDB51 and UCF101 datasets show that the proposed spatiotemporal squeeze-and-excitation residual multiplier networks can effectively improve the performance of action recognition.

关键词

Keywords

references

HERATH S , HARANDI M , PORIKLI F . Going deeper into action recognition:a survey [J ] . Image and Vision Computing , 2017 ( 60 ): 4 - 21 .

胡琼 , 秦磊 , 黄庆 . 基于视觉的人体动作识别综述 [J ] . 计算机学报 , 2013 , 36 ( 12 ): 2512 - 2524 .

HU Q , QIN L , HUANG Q . Overview of human action recognition based on vision [J ] . Chinese Journal of Computers , 2013 , 36 ( 12 ): 2512 - 2524 .

朱煜 , 赵江坤 , 王逸宁 . 基于深度学习的人体行为识别算法综述 [J ] . 自动化学报 , 2016 , 42 ( 6 ): 848 - 857 .

ZHU Y , ZHAO J K , WANG Y N . A review of human action recognition based on deep learning [J ] . ACTA Automatica Sinica , 2016 , 42 ( 6 ): 848 - 857 .

罗会兰 , 王婵娟 , 卢飞 . 视频行为识别综述 [J ] . 通信学报 , 2018 , 39 ( 6 ): 173 - 184 .

LUO H L , WANG C J , LU F . Survey of video behavior recognition [J ] . Journal on Communications , 2018 , 39 ( 6 ): 173 - 184 .

BOBICK A F , DAVIS J W . An appearance-based representation of action [C ] // International Conference on Pattern Recognition . IEEE , 1996 : 307 - 312 .

WEINLAND D , RONFARD R , BOYER E . Free viewpoint action recognition using motion history volumes [J ] . Computer Vision and Image Understanding , 2006 , 104 ( 2-3 ): 249 - 257 .

YILMAZ A , SHAH M . Actions sketch:a novel action representation [C ] // Computer Vision and Pattern Recognition . IEEE , 2005 : 984 - 989 .

WANG H , ULLAH M M , KLASER A , et al . Evaluation of local spatio-temporal features for action recognition [C ] // British Machine Vision Conference . BMVA , 2009 : 1 - 11 .

KLASER A , SCHMID C . Action recognition by dense trajectories [C ] // Computer Vision and Pattern Recognition . IEEE , 2011 : 3169 - 3176 .

WANG H , SCHMID C . Action recognition with improved trajectories [C ] // International Conference on Computer Vision . IEEE , 2013 : 3551 - 3558 .

JI S , XU W , YANG M , et al . 3D convolutional neural networks for human action recognition [J ] . IEEE Transactions on Pattern Analysis ＆Machine Intelligence , 2013 , 35 ( 1 ): 221 - 231 .

DU T , BOURDEV L , FERGUS R , et al . Learning spatiotemporal features with 3D convolutional networks [C ] // International Conference on Computer Vision . IEEE , 2015 : 4489 - 4497 .

TRAN D , RAY J , SHOU Z , et al . ConvNet architecture search for spatiotemporal feature learning [J ] . Computing Research Repository , 2017 , 16 ( 8 ): 178 - 190 .

KARPATHY A , TODERICI G , SHETTY S , et al . Large-scale video classification with convolutional neural networks [C ] // Computer Vision and Pattern Recognition . IEEE , 2014 : 1725 - 1732 .

SIMONYAN K , ZISSERMAN A . Two-stream convolutional networks for action recognition in videos [C ] // Neural Information Processing Systems . NeurlPS , 2014 : 568 - 576 .

WANG L , XIONG Y , WANG Z , et al . Temporal segment networks:towards good practices for deep action recognition [J ] . ACM Transactions on Information Systems , 2016 , 22 ( 1 ): 20 - 36 .

FEICHTENHOFER C , PINZ A , WILDES R P . Spatiotemporal residual networks for video action recognition [C ] // Neural Information Processing Systems . NeurlPS , 2016 : 3468 - 3476 .

WANG X , FARHADI A , GUPTA A . Actions～transformations [C ] // Computer Vision and Pattern Recognition . IEEE , 2016 : 2658 - 2667 .

WANG Y , LONG M , WANG J , et al . Spatiotemporal pyramid network for video action recognition [C ] // Computer Vision and Pattern Recognition . IEEE , 2017 : 2097 - 2106 .

FEICHTENHOFER C , PINZ A , ZISSERMAN A . Convolutional two-stream network fusion for video action recognition [C ] // Computer Vision and Pattern Recognition . IEEE , 2016 : 1933 - 1941 .

FEICHTENHOFER C , PINZ A , WILDES R P . Spatiotemporal multiplier networks for video action recognition [C ] // Computer Vision and Pattern Recognition . IEEE , 2017 : 7445 - 7454 .

WANG L , GE L , LI R , et al . Three-stream CNNs for action recognition [J ] . Pattern Recognition Letters , 2017 , 92 ( C ): 33 - 40 .

BILEN H , FERNANDO B , GAVVES E , et al . Action recognition with dynamic image networks [J ] . IEEE Transactions on Pattern Analysis and Machine Intelligence , 2018 , 40 ( 12 ): 2799 - 2813 .

HE K , ZHANG X , REN S , et al . Deep residual learning for image recognition [C ] // Computer Vision and Pattern Recognition . IEEE , 2016 : 770 - 778 .

HU J , SHEN L , SUN G . Squeeze-and-excitation networks [C ] // Computer Vision and Pattern Recognition . IEEE , 2018 : 7132 - 7141 .

SOOMRO K , ZAMIR A R , SHAH M . UCF101:a dataset of 101 human actions classes from videos in the wild [J ] . Computer Science , 2012 , 3 ( 12 ): 1 - 9 .

KUEHNE H , JHUANG H , GARROTE E , et al . HMDB:a large video database for human motion recognition [C ] // International Conference on Computer Vision . IEEE , 2011 : 2556 - 2563 .

ZHANG C L , ZHANG H , WEI X S , et al . Deep bimodal regression for apparent personality analysis [C ] // European Conference on Computer Vision Workshops . Springer , 2016 : 311 - 324 .

KHOWAJA S A , LEE S-L . Semantic image networks for human action recognition [J ] . The Computing Research Repository , 2019 , 21 ( 1 ): 1 - 30 .

浏览量

521

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于骨骼及表观特征融合的动作识别方法