基于骨骼及表观特征融合的动作识别方法

王洪雁; 袁海

doi:10.11959/j.issn.1000-436x.2022020

您当前的位置：

首页 >

文章列表页 >

基于骨骼及表观特征融合的动作识别方法

学术论文 | 更新时间：2024-06-05

- 基于骨骼及表观特征融合的动作识别方法
- Action recognition method based on fusion of skeleton and apparent features
- 通信学报 2022年43卷第1期页码：138-148
- 作者机构：
  
  1. 浙江理工大学信息学院，浙江杭州 310018
  2. 大连大学信息工程学院，辽宁大连 116622
- 作者简介：
  
  [ "王洪雁（1979- ），男，河南南阳人，博士，浙江理工大学特聘教授、硕士生导师，主要研究方向为阵列信号处理、机器视觉等" ]
  [ "袁海（1996- ），男，辽宁锦州人，大连大学硕士生，主要研究方向为图像处理、动作识别等" ]
- 基金信息：
  
  国家自然科学基金资助项目(61301258);国家自然科学基金资助项目(61271379);国家自然科学基金资助项目(61871164);浙江省自然科学基金重点项目(LZ21F010002);中国博士后科学基金资助项目(2016M590218)
- DOI：10.11959/j.issn.1000-436x.2022020
  中图分类号： TP391
- 网络出版日期：2022-01，
  
  纸质出版日期：2022-01-25
- 稿件说明：
移动端阅览
王洪雁, 袁海. 基于骨骼及表观特征融合的动作识别方法[J]. 通信学报, 2022,43(1):138-148.

Hongyan WANG, Hai YUAN. Action recognition method based on fusion of skeleton and apparent features[J]. Journal on communications, 2022, 43(1): 138-148.
王洪雁, 袁海. 基于骨骼及表观特征融合的动作识别方法[J]. 通信学报, 2022,43(1):138-148. DOI： 10.11959/j.issn.1000-436x.2022020.

Hongyan WANG, Hai YUAN. Action recognition method based on fusion of skeleton and apparent features[J]. Journal on communications, 2022, 43(1): 138-148. DOI： 10.11959/j.issn.1000-436x.2022020.

摘要

针对传统动作识别算法不易区分相似动作的问题，提出一种基于深度关节与手工表观特征融合的动作识别方法。首先将关节空域位置及约束输入具有时空注意力机制的长短期记忆（LSTM）模型中，获取时空加权且高可分的深度关节特征；然后引入热图定位关键帧及关节，手工提取关键关节周围表观特征以作为深度关节特征有效补充；最后基于双流网络逐帧融合表观特征及深度骨骼特征来实现相似动作有效判别。仿真结果表明，与主流方法相比，所提方法能有效区分相似动作，进而显著提升动作准确率。

Abstract

Focusing on the issue that traditional skeletal feature-based action recognition algorithms were not easy to distinguish similar actions

an action recognition method based on the fusion of deep joints and manual apparent features was considered.The joint spatial position and constraints was firstly input into the long short-term memory (LSTM) model equipped with spatio-temporal attention mechanism to acquire spatio-temporal weighted and highly separable deep joint features.After that

heat maps were introduced to locate the key frames and joints

and manually extract the apparent features around the key joints that could be considered as an effective complement to the deep joint features.Finally

the apparent features and the deep skeleton features could be fused frame by frame to achieve effectively discriminating similar actions.Simulation results show that

compared with the state-of-the-art action recognition methods

the proposed method can distinguish similar actions effectively and then the accuracy of action recognition is promoted rather obviously.

关键词

Keywords

references

罗会兰 , 王婵娟 , 卢飞 . 视频行为识别综述 [J ] . 通信学报 , 2018 , 39 ( 6 ): 169 - 180 .

LUO H L , WANG C J , LU F . Survey of video behavior recognition [J ] . Journal on Communications , 2018 , 39 ( 6 ): 169 - 180 .

JIANG Y G , DAI Q , LIU W , et al . Human action recognition in unconstrained videos by explicit motion modeling [J ] . IEEE Transactions on Image Processing:a Publication of the IEEE Signal Processing Society , 2015 , 24 ( 11 ): 3781 - 3795 .

LIU M Y , LIU H . Depth Context:a new descriptor for human activity recognition by using sole depth sequences [J ] . Neurocomputing , 2016 , 175 : 747 - 758 .

CHEN C , LIU M Y , LIU H , et al . Multi-temporal depth motion maps-based local binary patterns for 3-D human action recognition [J ] . IEEE Access , 2017 , 5 : 22590 - 22604 .

SHOTTON J , FITZGIBBON A , COOK M , et al . Real-time human pose recognition in parts from single depth images [C ] // Machine Learning for Computer Vision . Berlin:Springer , 2013 : 119 - 135 .

HAN F , REILY B , HOFF W , et al . Space-time representation of people based on 3D skeletal data:a review [J ] . Computer Vision and Image Understanding , 2017 , 158 : 85 - 105 .

KE Q H , BENNAMOUN M , AN S J , et al . Learning clip representations for skeleton-based 3D action recognition [J ] . IEEE Transactions on Image Processing:a Publication of the IEEE Signal Processing Society , 2018 , 27 ( 6 ): 2842 - 2855 .

VEMULAPALLI R , ARRATE F , CHELLAPPA R . Human action recognition by representing 3D skeletons as points in a lie group [C ] // Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2014 : 588 - 595 .

AHMED F , PAUL P P , GAVRILOVA M . Adaptive pooling of the most relevant spatio-temporal features for action recognition [C ] // Proceedings of 2016 IEEE International Symposium on Multimedia . Piscataway:IEEE Press , 2016 : 177 - 180 .

WANG L , HUYNH D Q , KONIUSZ P . A comparative review of recent kinect-based action recognition algorithms [J ] . IEEE Transactions on Image Processing , 2020 , 29 : 15 - 28 .

BANERJEE A , SINGH P K , SARKAR R . Fuzzy integral-based CNN classifier fusion for 3D skeleton action recognition [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2021 , 31 ( 6 ): 2206 - 2216 .

LE Q V , JAITLY N , HINTON G E . A simple way to initialize recurrent networks of rectified linear units [J ] . arXiv Preprint,arXiv:1504.00941 , 2015 .

ZHANG J , BAI F S , ZHAO J F , et al . Multi-views action recognition on 3D ResNet-LSTM framework [C ] // Proceedings of 2021 IEEE 2nd International Conference on Big Data,Artificial Intelligence and Internet of Things Engineering . Piscataway:IEEE Press , 2021 : 289 - 293 .

AVOLA D , CASCIO M , CINQUE L , et al . 2-D skeleton-based action recognition via two-branch stacked LSTM-RNNs [J ] . IEEE Transactions on Multimedia , 2020 , 22 ( 10 ): 2481 - 2496 .

JIANG X H , XU K , SUN T F . Action recognition scheme based on skeleton representation with DS-LSTM network [J ] . IEEE Transactions on Circuits and Systems for Video Technology , 2020 , 30 ( 7 ): 2129 - 2140 .

KWAK I S , GUO J Z , HANTMAN A , et al . Detecting the starting frame of actions in video [C ] // Proceedings of 2020 IEEE Winter Conference on Applications of Computer Vision . Piscataway:IEEE Press , 2020 : 478 - 486 .

SONG S J , LAN C L , XING J L , et al . Spatio-temporal attention-based LSTM networks for 3D action recognition and detection [J ] . IEEE Transactions on Image Processing:a Publication of the IEEE Signal Processing Society , 2018 , 27 ( 7 ): 3459 - 3471 .

SCHINDLER K , VAN GOOL L . Action snippets:how many frames does human action recognition require? [C ] // Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2008 : 1 - 8 .

OJALA T , PIETIKÄINEN M , HARWOOD D . A comparative study of texture measures with classification based on featured distributions [J ] . Pattern Recognition , 1996 , 29 ( 1 ): 51 - 59 .

PIETIKÄINEN M , . Image analysis with local binary patterns [C ] // Proceedings of the 14th Scandinavian Conference on Image Analysis .[S.l.:s.n. ] , 2005 : 115 - 118 .

梁淑芬 , 刘银华 , 李立琛 . 基于LBP和深度学习的非限制条件下人脸识别算法 [J ] . 通信学报 , 2014 , 35 ( 6 ): 154 - 160 .

LIANG S F , LIU Y H , LI L C . Face recognition under unconstrained based on LBP and deep learning [J ] . Journal on Communications , 2014 , 35 ( 6 ): 154 - 160 .

LEI L , PENG J , YANG B . Image retrieval based on HSV feature and regional Shannon entropy [J ] . International Journal of Software Science and Computational Intelligence , 2012 , 4 ( 2 ): 64 - 80 .

YU P , ZHANG C , DU C H . Image retrievals based on color and texture features [C ] // Proceedings of 2007 9th International Symposium on Signal Processing and Its Applications . Piscataway:IEEE Press , 2007 : 1 - 4 .

SHAHROUDY A , LIU J , NG T T , et al . NTU RGB+D:a large scale dataset for 3D human activity analysis [C ] // Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2016 : 1010 - 1019 .

HU J F , ZHENG W S , LAI J H , et al . Jointly learning heterogeneous features for RGB-D activity recognition [C ] // Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2015 : 5344 - 5352 .

TU J H , LIU M Y , LIU H . Skeleton-based human action recognition using spatial temporal 3D convolutional neural networks [C ] // Proceedings of 2018 IEEE International Conference on Multimedia and Expo . Piscataway:IEEE Press , 2018 : 1 - 6 .

LIU J , SHAHROUDY A , XU D , et al . Spatio-temporal LSTM with trust gates for 3D human action recognition [C ] // Computer Vision – ECCV 2016 . Berlin:Springer , 2016 : 816 - 833 .

WANG H S , WANG L . Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks [C ] // Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2017 : 3633 - 3642 .

CAETANO C , BRÉMOND F , SCHWARTZ W R . Skeleton image representation for 3D action recognition based on tree structure and reference joints [C ] // Proceedings of 2019 32nd SIBGRAPI Conference on Graphics,Patterns and Images (SIBGRAPI) . Piscataway:IEEE Press , 2019 : 16 - 23 .

WANG J , NIE X H , XIA Y , et al . Cross-view action modeling,learning,and recognition [C ] // Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2014 : 2649 - 2656 .

XIA L , CHEN C C , AGGARWAL J K . View invariant human action recognition using histograms of 3D joints [C ] // Proceedings of 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops . Piscataway:IEEE Press , 2012 : 20 - 27 .

DU Y , WANG W , WANG L . Hierarchical recurrent neural network for skeleton based action recognition [C ] // Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition . Piscataway:IEEE Press , 2015 : 1110 - 1118 .

XIAO Y , CHEN J , WANG Y C , et al . Action recognition for depth video using multi-view dynamic images [J ] . Information Sciences , 2019 , 480 : 287 - 304 .

YUN K , HONORIO J , CHATTOPADHYAY D , et al . Two-person interaction detection using body-pose features and multiple instance learning [C ] // Proceedings of 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops . Piscataway:IEEE Press , 2012 : 28 - 35 .

ZHANG S Y , LIU X M , XIAO J . On geometric features for skeleton-based action recognition using multilayer LSTM networks [C ] // Proceedings of 2017 IEEE Winter Conference on Applications of Computer Vision . Piscataway:IEEE Press , 2017 : 148 - 157 .

ZHU W T , LAN C L , XING J L , et al . Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks [C ] // Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence . Palo Alto:AAAI Press , 2016 : 3697 - 3703 .

浏览量

316

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

城轨车-地场景下基于CGAN-LSTM网络的OTFS-ISAC系统信道估计

基于深度学习的电离层参数预测研究

时空压缩激励残差乘法网络的视频动作识别

DeepRD：基于Siamese LSTM网络的Android重打包应用检测方法