浏览全部资源
扫码关注微信
江西理工大学信息工程学院,江西 赣州 341000
[ "罗会兰(1974-),女,江西上高人,博士,江西理工大学教授、硕士生导师,主要研究方向为机器学习、模式识别。" ]
[ "王婵娟(1992-),女,江西鄱阳人,江西理工大学硕士生,主要研究方向为计算机视觉、行为识别。" ]
[ "卢飞(1994-),男,江西赣州人,江西理工大学硕士生,主要研究方向为图像处理、机器视觉。" ]
网络出版日期:2018-06,
纸质出版日期:2018-06-25
移动端阅览
罗会兰, 王婵娟, 卢飞. 视频行为识别综述[J]. 通信学报, 2018,39(6):169-180.
Huilan LUO, Chanjuan WANG, Fei LU. Survey of video behavior recognition[J]. Journal on communications, 2018, 39(6): 169-180.
罗会兰, 王婵娟, 卢飞. 视频行为识别综述[J]. 通信学报, 2018,39(6):169-180. DOI: 10.11959/j.issn.1000-436x.2018107.
Huilan LUO, Chanjuan WANG, Fei LU. Survey of video behavior recognition[J]. Journal on communications, 2018, 39(6): 169-180. DOI: 10.11959/j.issn.1000-436x.2018107.
目前行为识别发展迅速,许多基于深度网络自动学习特征的行为识别算法被提出。深度学习方法需要大量数据来训练,对电脑存储、运算能力要求较高。在回顾了当下流行的基于深度网络的行为识别方法的基础上,着重综述了基于手动提取特征的传统行为识别方法。传统行为识别方法通常遵循对视频提取特征并进行建模和预测分类的流程,并将识别流程细分为以下几个步骤进行综述:特征采样、特征描述符选取、特征预/后处理、描述符聚类、向量编码。同时,还对评价算法性能的基准数据集进行了归纳总结。
Behavior recognition is developing rapidly
and a number of behavior recognition algorithms based on deep network automatic learning features have been proposed.The deep learning method requires a large number of data to train
and requires higher computer storage and computing power.After a brief review of the current popular behavior recognition method based on deep network
it focused on the traditional behavior recognition methods.Traditional behavior recognition methods usually followed the processes of video feature extraction
modeling of features and classification.Following the basic process
the recognition process was overviewed according to the following steps
feature sampling
feature descriptors
feature processing
descriptor aggregation and vector coding.At the same time
the benchmark data set commonly used for evaluating the algorithm performance was also summarized.
MOESLUND T B , HILTON A , KRUGER V . A survey of advances in vision-based human motion capture and analysis [J ] . Computer Vision& Image Understanding , 2006 , 104 ( 2 ): 90 - 126 .
CHENG G C , WAN Y F , SAUDAGAR A N , et al . Advances in human action recognition:a survey [J ] . Computer Science , 2015 , 2015 ( 1 ): 1 - 30 .
JI X , LIU H . Advances in view-invariant human motion analysis:a review [J ] . IEEE Transactions on Systems Man & Cybernetics Part C , 2009 , 40 ( 1 ): 13 - 24 .
DHAMSANIA C J , RATANPARA T V . A survey on human action recognition from videos [C ] // Online International Conference on Green Engineering and Technologies . 2017 : 1 - 5 .
CANDAMO J , SHREVE M , GOLDGOF D B , et al . Understanding transit scenes:a survey on human behavior recognition algorithms [J ] . IEEE Transactions on Intelligent Transportation Systems , 2010 , 11 ( 1 ): 206 - 224 .
POPPE R . A survey on vision-based human action recognition [J ] . Image & Vision Computing , 2010 , 28 ( 6 ): 976 - 990 .
WEINLAND D , RONFARD R , BOYER E . A survey of vision-based methods for action representation,segmentation and recognition [J ] . Computer Vision & Image Understanding , 2011 , 115 ( 2 ): 224 - 241 .
CHAUDHARY A , RAHEJA J L , DAS K , et al . A survey on hand gesture recognition in context of soft computing [C ] // International Conference on Computer Science and Information Technology . 2011 : 46 - 55 .
LAPTEV I . On space-time interest points [J ] . International Journal of Computer Vision , 2005 , 64 ( 2-3 ): 107 - 123 .
HARRIS C J . A combined corner and edge detector [J ] . Proc Alvey Vision Conf , 1988 , 1988 ( 3 ): 147 - 151 .
SOOMRO K , ZAMIR A R , SHAH M . UCF101:a dataset of 101 human actions classes from videos in the wild [J ] . Computer Science , 2012 .
OIKONOMOPOULOS A , PATRAS I , PANTIC M . Spatiotemporal salient points for visual recognition of human actions [J ] . IEEE Transactions on Systems Man & Cybernetics Part B Cybernetics A Publication of the IEEE Systems Man & Cybernetics Society , 2006 , 36 ( 3 ): 710 - 719 .
DOLLAR P , RABAUD V , COTTRELL G , et al . Behavior recognition via sparse spatio-temporal features [C ] // IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance . 2006 : 65 - 72 .
RAPANTZIKOS K , AVRITHIS Y , KOLLIAS S . Spatiotemporal saliency for event detection and representation in the 3d wavelet domain:potential in human action recognition [C ] // ACM International Conference on Image and Video Retrieval . 2007 : 294 - 301 .
RAPANTZIKOS K , AVRITHIS Y , KOLLIAS S . Dense saliency-based spatiotemporal feature points for action recognition [C ] // Computer Vision and Pattern Recognition . 2009 : 1454 - 1461 .
WILLEMS G , TUYTELAARS T , GOOL L . An efficient dense and scale-invariant spatio-temporal interest point detector [C ] // European Conference on Computer Vision . 2008 : 650 - 663 .
WANG H , KLASER A , SCHMID C , et al . Action recognition by dense trajectories [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2011 : 3169 - 3176 .
MURTHY O V R , GOECKE R . Ordered trajectories for human action recognition with large number of classes [J ] . Image & Vision Computing , 2015 , 42 ( C ): 22 - 34 .
CHO J , LEE M , CHANG H J , et al . Robust action recognition using local motion and group sparsity [J ] . Pattern Recognition , 2014 , 47 ( 5 ): 1813 - 1825 .
WANG H , SCHMID C . Action recognition with improved trajectories [C ] // IEEE International Conference on Computer Vision . 2014 : 3551 - 3558 .
FERNANDO B , GAVVES E , ORAMAS M J , et al . Modeling video evolution for action recognition [C ] // IEEE Conference Computer Vision and Pattern Recognition . 2015 : 5378 - 5387 .
JHUANG H , SERRE T , WOLF L , et al . A biologically inspired system for action recognition [C ] // International Conference on Computer Vision . 2007 : 1 - 8 .
PENG X , QIAO Y , PENG Q , et al . Exploring motion boundary based sampling and spatial-temporal context descriptors for action recognition [C ] // British Machine Vision Conference . 2013 .
ALI S , BASHARAT A , SHAH M . Chaotic invariants for human action recognition [C ] // International Conference on Computer Vision . 2007 : 1 - 8 .
YILMA A , SHAH M . Recognizing human actions in videos acquired by uncalibrated moving cameras [C ] // Tenth IEEE International Conference on Computer Vision . 2005 : 150 - 157 .
JHUANG H , GALL J , ZUFFI S , et al . Towards understanding action recognition [C ] // IEEE International Conference on Computer Vision . 2014 : 3192 - 3199 .
SINGH V K , NEVATIA R . Action recognition in cluttered dynamic scenes using pose-specific part models [C ] // International Conference on Computer Vision . 2011 : 113 - 120 .
DU Y , WANG W , WANG L . Hierarchical recurrent neural network for skeleton based action recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2015 : 1110 - 1118 .
WU D , SHAO L . Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2014 : 724 - 731 .
WANG C , WANG Y , YUILLE A L . An approach to pose-based action recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2013 : 915 - 922 .
JIANG Z , LIN Z , DAVIS L S . Recognizing human actions by learning and matching shape-motion prototype trees [J ] . IEEE Transactions on Pattern Analysis & Machine Intelligence , 2012 , 34 ( 3 ): 533 - 547 .
HUANG M , SU S Z , CAI G R , et al . Meta-action descriptor for action recognition in RGBD video [J ] . IET Computer Vision , 2017 , 11 ( 4 ): 301 - 308 .
GORELICK L , BLANK M , SHECHTMAN E , et al . Actions as space-time shapes [J ] . IEEE Transactions on Pattern Analysis & Machine Intelligence , 2007 , 29 ( 12 ): 2247 - 2253 .
DALAL N , TRIGGS B . Histograms of oriented gradients for human detection [C ] // Computer Vision and Pattern Recognition . 2005 : 886 - 893 .
DALAL N , TRIGGS B , SCHMID C . Human detection using oriented histograms of flow and appearance [C ] // European Conference on Computer Vision . 2006 : 428 - 441 .
LAPTEV I , MARSZALEK M , SCHMID C , et al . Learning realistic human actions from movies [C ] // Computer Vision and Pattern Recognition . 2008 : 1 - 8 .
PENG X , WANG L , WANG X , et al . Bag of visual words and fusion methods for action recognition:comprehensive study and good practice [J ] . Computer Vision & Image Understanding , 2016 , 150 ( C ): 109 - 125 .
PERRONNIN F , MENSINK T . Improving the fisher kernel for large-scale image classification [C ] // European Conference on Computer Vision . 2010 : 143 - 156 .
JEGOU H , DOUZE M , SCHMID C , et al . Aggregating local descriptors into a compact image representation [C ] // Computer Vision and Pattern Recognition . 2010 : 3304 - 3311 .
SIMONYAN K , ZISSERMAN A . Two-stream convolutional networks for action recognition in videos [J ] . Neural Information Processing Systems , 2014 , 1 ( 4 ): 568 - 576 .
WANG L , GE L , LI R , et al . Three-stream CNNs for action recognition [J ] . Pattern Recognition Letters , 2017 , 92 ( C ): 33 - 40 .
KUEHNE H , JHUANG H , STIEFELHAGEN R , et al . HMDB51:a large video database for human motion recognition [C ] // IEEE International Conference on Computer Vision . 2011 : 2556 - 2563 .
GKIOXARI G , GIRSHICK R , MALIK J . Contextual action recognition with R*CNN [J ] . CoRR , 2016 , 40 ( 1 ): 1080 - 1088 .
GKIOXARI G , GIRSHICK R , MALIK J . Actions and attributes from wholes and parts [C ] // International Conference on Computer Vision . 2015 : 2470 - 2478 .
HOAI M , . Regularized max pooling for image categorization [C ] // British Machine Vision Conference . 2014 : 94 - 100 .
SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [J ] . Computer Science , 2014 .
OQUAB M , BOTTOU L , LAPTEV I , et al . Learning and transferring mid-level image representations using convolutional neural networks [C ] // Conference on Computer Vision and Pattern Recognition . 2014 : 1717 - 1724 .
CHERON G , LAPTEV I , SCHMID C . P-CNN:pose-based CNN features for action recognition [C ] // International Conference on Computer Vision . 2015 : 3218 - 3226 .
ROHRBACH M , AMIN S , ANDRILUKA M , et al . A database for fine grained activity detection of cooking activities [C ] // Conference on Computer Vision and Pattern Recognition . 2012 : 1194 - 1201 .
ZHOU Y , NI B , HONG R , et al . Interaction part mining:a mid-level approach for fine-grained action recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2015 : 3323 - 3331 .
ZHOU Y , NI B , YAN S , et al . Pipelining localized semantic features for fine-grained action recognition [C ] // European Conference on Computer Vision . 2014 : 481 - 496 .
GRAVES A , MOHAMED A , HINTON G . Speech recognition with deep recurrent neural networks [C ] // IEEE International Conference on Acoustics,Speech and Signal Processing . 2013 : 6645 - 6649 .
HOCHREITER S , SCHMIDHUBER J . Long short-term memory [J ] . Neural Computation , 1997 , 9 ( 8 ): 1735 - 1780 .
NIEBLES J C , WANG H , LI F F . Unsupervised learning of human action categories using spatial-temporal words [J ] . International Journal of Computer Vision , 2008 , 79 ( 3 ): 299 - 318 .
DONAHUE J , HENDRICKS L A , GUADARRAMA S , et al . Long-term recurrent convolutional networks for visual recognition and description [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2015 : 2625 - 2634 .
NG Y H , HAUSKNECHT M , VIJAYANARASIMHAN S , et al . Beyond short snippets:deep networks for video classification [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2015 : 4694 - 4702 .
YU S , CHENG Y , XIE L , et al . A novel recurrent hybrid network for feature fusion in action recognition [J ] . Journal of Visual Communication & Image Representation , 2017 , 49 : 192 - 203 .
GROSS R , SHI J . The CMU motion of body (MoBo) database [J ] . Monumenta Nipponica , 2001 , 45 ( 4 ).
SCHULDT C , LAPTEV I , CAPUTO B . Recognizing human actions:a local SVM approach [C ] // International Conference on Pattern Recognition . 2004 : 32 - 36 .
WEINLAND D , RONFARD R , BOYER E . Free viewpoint action recognition using motion history volumes [J ] . Computer Vision & Image Understanding , 2011 , 104 ( 2 ): 249 - 257 .
RODRIGUEZ M D , AHMED J , SHAH M . Action MACH a spatio-temporal maximum average correlation height filter for action recognition [C ] // Conference on Computer Vision and Pattern Recognition . 2008 : 1 - 8 .
MARSZALEK M , LAPTEV I , SCHMID C . Actions in context [C ] // Conference on Computer Vision and Pattern Recognition . 2009 : 2929 - 2936 .
SIGAL L , BALAN A O , BLACK M J . HumanEva:synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion [J ] . International Journal of Computer Vision , 2006 , 87 ( 1-2 ): 4 - 27 .
LIU J , LUO J , SHAH M . Recognizing realistic actions from videos in the wild [C ] // Computer Vision and Pattern Recognition . 2009 : 1996 - 2003 .
LI W , ZHANG Z , LIU Z . Action recognition based on a bag of 3D points [C ] // Conference on Computer Vision and Pattern Recognition . 2010 : 9 - 14 .
REDDY K K , SHAH M . Recognizing 50 human action categories of web videos [J ] . Machine Vision & Applications , 2013 , 24 ( 5 ): 971 - 981 .
ELLIS C , MASOOD S Z , TAPPEN M F , et al . Exploring the trade-off between accuracy and observational latency in action recognition [J ] . International Journal of Computer Vision , 2013 , 101 ( 3 ): 420 - 436 .
PENG X , ZOU C , QIAO Y , et al . Action recognition with stacked fisher vectors [C ] // European Conference on Computer Vision . 2014 : 581 - 595 .
DUTA I C , LONESCU B , AIZAWA K , et al . Spatio-temporal VLAD encoding for human action recognition in videos [C ] // International Conference on Multimedia Modeling . 2017 : 365 - 378 .
BILEN H , FERNANDO B , GAVVES E , et al . Action recognition with dynamic image networks [J ] . IEEE Transactions on Pattern Analysis &Machine Intelligence , 2017 , PP ( 99 ): 1 .
WU X , XU D , DUAN L , et al . Action recognition using context and appearance distribution features [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2011 : 489 - 496 .
LIU J , KUIPERS B , SAVARESE S . Recognizing human actions by attributes [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2011 : 3337 - 3344 .
CORSO J J , . Action bank:a high-level representation of activity in video [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2012 : 1234 - 1241 .
CHEN M , GONG L , WANG T , et al . Action recognition using lie Algebrized Gaussians over dense local spatio-temporal features [J ] . Multimedia Tools & Applications , 2015 , 74 ( 6 ): 2127 - 2142 .
ZHANG Z , TAO D . Slow feature analysis for human action recognition [J ] . IEEE Transactions on Pattern Analysis & Machine Intelligence , 2012 , 34 ( 3 ): 436 - 450 .
JI S , XU W , YANG M , et al . 3D convolutional neural networks for human action recognition [J ] . IEEE Transactions on Pattern Analysis &Machine Intelligence , 2013 , 35 ( 1 ): 221 - 231 .
HASAN M , ROY-CHOWDHURY A K , . Continuous learning of human activity models using deep nets [C ] // European Conference on Computer Vision . 2014 : 705 - 720 .
SUN L , JIA K , CHAN T H , et al . DL-SFA:deeply-learned slow feature analysis for action recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2014 : 2625 - 2632 .
JIANG Y G , DAI Q , XUE X , et al . Trajectory-based modeling of human actions with motion reference points [C ] // European Conference on Computer Vision . 2012 : 425 - 438 .
WANG L M , QIAO Y , TANG X . Motionlets:mid-level 3d parts for human motion recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2013 : 2674 - 2681 .
SUN L , JIA K , YEUNG D Y , et al . Human action recognition using factorized spatio-temporal convolutional networks [C ] // IEEE International Conference on Computer Vision . 2015 : 4597 - 4605 .
WANG L , QIAO Y , TANG X . Action recognition with trajectory-pooled deep-convolutional descriptors [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2015 : 4305 - 4314 .
PARK E , HAN X , BERG T L , et al . Combining multiple sources of knowledge in deep CNNs for action recognition [C ] // IEEE Winter Conference on Applications of Computer Vision . 2016 : 1 - 8 .
SOUZA C R D , GAIDON A , VIG E , et al . Sympathy for the details:dense trajectories and hybrid classification architectures for action recognition [C ] // European Conference on Computer Vision . 2016 : 697 - 716 .
YU S , CHENG Y , SU S , et al . Stratified pooling based deep convolutional neural networks for human action recognition [J ] . Multimedia Tools & Applications , 2017 , 76 ( 11 ): 13367 - 13382 .
MURTHY O V R , GOECKE R . Ordered trajectories for large scale human action recognition [C ] // IEEE International Conference on Computer Vision . 2014 : 412 - 419 .
PENG X , WANG L , QIAO Y , et al . Boosting VLAD with supervised dictionary learning and high-order statistics [C ] // European Conference on Computer Vision . 2014 : 660 - 674 .
LAN Z , LIN M , LI X , et al . Beyond gaussian pyramid:multi-skip feature stacking for action recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2015 : 204 - 212 .
FEICHTENHOFER C , PINZ A , WILDES R P . Spatiotemporal multip lier networks for video action recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2017 : 7445 - 7454 .
0
浏览量
5749
下载量
15
CSCD
关联资源
相关文章
相关作者
相关机构