视频行为识别综述

罗会兰; 王婵娟; 卢飞

doi:10.11959/j.issn.1000-436x.2018107

您当前的位置：

首页 >

文章列表页 >

视频行为识别综述

综述 | 更新时间：2024-06-05

- 视频行为识别综述
- Survey of video behavior recognition
- 通信学报 2018年39卷第6期页码：169-180
- 作者机构：
  
  江西理工大学信息工程学院，江西赣州 341000
- 作者简介：
  
  [ "罗会兰（1974-），女，江西上高人，博士，江西理工大学教授、硕士生导师，主要研究方向为机器学习、模式识别。" ]
  [ "王婵娟（1992-），女，江西鄱阳人，江西理工大学硕士生，主要研究方向为计算机视觉、行为识别。" ]
  [ "卢飞（1994-），男，江西赣州人，江西理工大学硕士生，主要研究方向为图像处理、机器视觉。" ]
- 基金信息：
  
  国家自然科学基金资助项目(61105042);国家自然科学基金资助项目(61462035);江西省自然科学基金资助项目(20171BAB202014)
- DOI：10.11959/j.issn.1000-436x.2018107
  中图分类号： TP391
- 网络出版日期：2018-06，
  
  纸质出版日期：2018-06-25
- 稿件说明：
移动端阅览
罗会兰, 王婵娟, 卢飞. 视频行为识别综述[J]. 通信学报, 2018,39(6):169-180.

Huilan LUO, Chanjuan WANG, Fei LU. Survey of video behavior recognition[J]. Journal on communications, 2018, 39(6): 169-180.
罗会兰, 王婵娟, 卢飞. 视频行为识别综述[J]. 通信学报, 2018,39(6):169-180. DOI： 10.11959/j.issn.1000-436x.2018107.

Huilan LUO, Chanjuan WANG, Fei LU. Survey of video behavior recognition[J]. Journal on communications, 2018, 39(6): 169-180. DOI： 10.11959/j.issn.1000-436x.2018107.

摘要

目前行为识别发展迅速，许多基于深度网络自动学习特征的行为识别算法被提出。深度学习方法需要大量数据来训练，对电脑存储、运算能力要求较高。在回顾了当下流行的基于深度网络的行为识别方法的基础上，着重综述了基于手动提取特征的传统行为识别方法。传统行为识别方法通常遵循对视频提取特征并进行建模和预测分类的流程，并将识别流程细分为以下几个步骤进行综述：特征采样、特征描述符选取、特征预/后处理、描述符聚类、向量编码。同时，还对评价算法性能的基准数据集进行了归纳总结。

Abstract

Behavior recognition is developing rapidly

and a number of behavior recognition algorithms based on deep network automatic learning features have been proposed.The deep learning method requires a large number of data to train

and requires higher computer storage and computing power.After a brief review of the current popular behavior recognition method based on deep network

it focused on the traditional behavior recognition methods.Traditional behavior recognition methods usually followed the processes of video feature extraction

modeling of features and classification.Following the basic process

the recognition process was overviewed according to the following steps

feature sampling

feature descriptors

feature processing

descriptor aggregation and vector coding.At the same time

the benchmark data set commonly used for evaluating the algorithm performance was also summarized.

关键词

Keywords

references

MOESLUND T B , HILTON A , KRUGER V . A survey of advances in vision-based human motion capture and analysis [J ] . Computer Vision＆ Image Understanding , 2006 , 104 ( 2 ): 90 - 126 .

CHENG G C , WAN Y F , SAUDAGAR A N , et al . Advances in human action recognition:a survey [J ] . Computer Science , 2015 , 2015 ( 1 ): 1 - 30 .

JI X , LIU H . Advances in view-invariant human motion analysis:a review [J ] . IEEE Transactions on Systems Man ＆ Cybernetics Part C , 2009 , 40 ( 1 ): 13 - 24 .

DHAMSANIA C J , RATANPARA T V . A survey on human action recognition from videos [C ] // Online International Conference on Green Engineering and Technologies . 2017 : 1 - 5 .

CANDAMO J , SHREVE M , GOLDGOF D B , et al . Understanding transit scenes:a survey on human behavior recognition algorithms [J ] . IEEE Transactions on Intelligent Transportation Systems , 2010 , 11 ( 1 ): 206 - 224 .

POPPE R . A survey on vision-based human action recognition [J ] . Image ＆ Vision Computing , 2010 , 28 ( 6 ): 976 - 990 .

WEINLAND D , RONFARD R , BOYER E . A survey of vision-based methods for action representation,segmentation and recognition [J ] . Computer Vision ＆ Image Understanding , 2011 , 115 ( 2 ): 224 - 241 .

CHAUDHARY A , RAHEJA J L , DAS K , et al . A survey on hand gesture recognition in context of soft computing [C ] // International Conference on Computer Science and Information Technology . 2011 : 46 - 55 .

LAPTEV I . On space-time interest points [J ] . International Journal of Computer Vision , 2005 , 64 ( 2-3 ): 107 - 123 .

HARRIS C J . A combined corner and edge detector [J ] . Proc Alvey Vision Conf , 1988 , 1988 ( 3 ): 147 - 151 .

SOOMRO K , ZAMIR A R , SHAH M . UCF101:a dataset of 101 human actions classes from videos in the wild [J ] . Computer Science , 2012 .

OIKONOMOPOULOS A , PATRAS I , PANTIC M . Spatiotemporal salient points for visual recognition of human actions [J ] . IEEE Transactions on Systems Man ＆ Cybernetics Part B Cybernetics A Publication of the IEEE Systems Man ＆ Cybernetics Society , 2006 , 36 ( 3 ): 710 - 719 .

DOLLAR P , RABAUD V , COTTRELL G , et al . Behavior recognition via sparse spatio-temporal features [C ] // IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance . 2006 : 65 - 72 .

RAPANTZIKOS K , AVRITHIS Y , KOLLIAS S . Spatiotemporal saliency for event detection and representation in the 3d wavelet domain:potential in human action recognition [C ] // ACM International Conference on Image and Video Retrieval . 2007 : 294 - 301 .

RAPANTZIKOS K , AVRITHIS Y , KOLLIAS S . Dense saliency-based spatiotemporal feature points for action recognition [C ] // Computer Vision and Pattern Recognition . 2009 : 1454 - 1461 .

WILLEMS G , TUYTELAARS T , GOOL L . An efficient dense and scale-invariant spatio-temporal interest point detector [C ] // European Conference on Computer Vision . 2008 : 650 - 663 .

WANG H , KLASER A , SCHMID C , et al . Action recognition by dense trajectories [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2011 : 3169 - 3176 .

MURTHY O V R , GOECKE R . Ordered trajectories for human action recognition with large number of classes [J ] . Image ＆ Vision Computing , 2015 , 42 ( C ): 22 - 34 .

CHO J , LEE M , CHANG H J , et al . Robust action recognition using local motion and group sparsity [J ] . Pattern Recognition , 2014 , 47 ( 5 ): 1813 - 1825 .

WANG H , SCHMID C . Action recognition with improved trajectories [C ] // IEEE International Conference on Computer Vision . 2014 : 3551 - 3558 .

FERNANDO B , GAVVES E , ORAMAS M J , et al . Modeling video evolution for action recognition [C ] // IEEE Conference Computer Vision and Pattern Recognition . 2015 : 5378 - 5387 .

JHUANG H , SERRE T , WOLF L , et al . A biologically inspired system for action recognition [C ] // International Conference on Computer Vision . 2007 : 1 - 8 .

PENG X , QIAO Y , PENG Q , et al . Exploring motion boundary based sampling and spatial-temporal context descriptors for action recognition [C ] // British Machine Vision Conference . 2013 .

ALI S , BASHARAT A , SHAH M . Chaotic invariants for human action recognition [C ] // International Conference on Computer Vision . 2007 : 1 - 8 .

YILMA A , SHAH M . Recognizing human actions in videos acquired by uncalibrated moving cameras [C ] // Tenth IEEE International Conference on Computer Vision . 2005 : 150 - 157 .

JHUANG H , GALL J , ZUFFI S , et al . Towards understanding action recognition [C ] // IEEE International Conference on Computer Vision . 2014 : 3192 - 3199 .

SINGH V K , NEVATIA R . Action recognition in cluttered dynamic scenes using pose-specific part models [C ] // International Conference on Computer Vision . 2011 : 113 - 120 .

DU Y , WANG W , WANG L . Hierarchical recurrent neural network for skeleton based action recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2015 : 1110 - 1118 .

WU D , SHAO L . Leveraging hierarchical parametric networks for skeletal joints based action segmentation and recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2014 : 724 - 731 .

WANG C , WANG Y , YUILLE A L . An approach to pose-based action recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2013 : 915 - 922 .

JIANG Z , LIN Z , DAVIS L S . Recognizing human actions by learning and matching shape-motion prototype trees [J ] . IEEE Transactions on Pattern Analysis ＆ Machine Intelligence , 2012 , 34 ( 3 ): 533 - 547 .

HUANG M , SU S Z , CAI G R , et al . Meta-action descriptor for action recognition in RGBD video [J ] . IET Computer Vision , 2017 , 11 ( 4 ): 301 - 308 .

GORELICK L , BLANK M , SHECHTMAN E , et al . Actions as space-time shapes [J ] . IEEE Transactions on Pattern Analysis ＆ Machine Intelligence , 2007 , 29 ( 12 ): 2247 - 2253 .

DALAL N , TRIGGS B . Histograms of oriented gradients for human detection [C ] // Computer Vision and Pattern Recognition . 2005 : 886 - 893 .

DALAL N , TRIGGS B , SCHMID C . Human detection using oriented histograms of flow and appearance [C ] // European Conference on Computer Vision . 2006 : 428 - 441 .

LAPTEV I , MARSZALEK M , SCHMID C , et al . Learning realistic human actions from movies [C ] // Computer Vision and Pattern Recognition . 2008 : 1 - 8 .

PENG X , WANG L , WANG X , et al . Bag of visual words and fusion methods for action recognition:comprehensive study and good practice [J ] . Computer Vision ＆ Image Understanding , 2016 , 150 ( C ): 109 - 125 .

PERRONNIN F , MENSINK T . Improving the fisher kernel for large-scale image classification [C ] // European Conference on Computer Vision . 2010 : 143 - 156 .

JEGOU H , DOUZE M , SCHMID C , et al . Aggregating local descriptors into a compact image representation [C ] // Computer Vision and Pattern Recognition . 2010 : 3304 - 3311 .

SIMONYAN K , ZISSERMAN A . Two-stream convolutional networks for action recognition in videos [J ] . Neural Information Processing Systems , 2014 , 1 ( 4 ): 568 - 576 .

WANG L , GE L , LI R , et al . Three-stream CNNs for action recognition [J ] . Pattern Recognition Letters , 2017 , 92 ( C ): 33 - 40 .

KUEHNE H , JHUANG H , STIEFELHAGEN R , et al . HMDB51:a large video database for human motion recognition [C ] // IEEE International Conference on Computer Vision . 2011 : 2556 - 2563 .

GKIOXARI G , GIRSHICK R , MALIK J . Contextual action recognition with R*CNN [J ] . CoRR , 2016 , 40 ( 1 ): 1080 - 1088 .

GKIOXARI G , GIRSHICK R , MALIK J . Actions and attributes from wholes and parts [C ] // International Conference on Computer Vision . 2015 : 2470 - 2478 .

HOAI M , . Regularized max pooling for image categorization [C ] // British Machine Vision Conference . 2014 : 94 - 100 .

SIMONYAN K , ZISSERMAN A . Very deep convolutional networks for large-scale image recognition [J ] . Computer Science , 2014 .

OQUAB M , BOTTOU L , LAPTEV I , et al . Learning and transferring mid-level image representations using convolutional neural networks [C ] // Conference on Computer Vision and Pattern Recognition . 2014 : 1717 - 1724 .

CHERON G , LAPTEV I , SCHMID C . P-CNN:pose-based CNN features for action recognition [C ] // International Conference on Computer Vision . 2015 : 3218 - 3226 .

ROHRBACH M , AMIN S , ANDRILUKA M , et al . A database for fine grained activity detection of cooking activities [C ] // Conference on Computer Vision and Pattern Recognition . 2012 : 1194 - 1201 .

ZHOU Y , NI B , HONG R , et al . Interaction part mining:a mid-level approach for fine-grained action recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2015 : 3323 - 3331 .

ZHOU Y , NI B , YAN S , et al . Pipelining localized semantic features for fine-grained action recognition [C ] // European Conference on Computer Vision . 2014 : 481 - 496 .

GRAVES A , MOHAMED A , HINTON G . Speech recognition with deep recurrent neural networks [C ] // IEEE International Conference on Acoustics,Speech and Signal Processing . 2013 : 6645 - 6649 .

HOCHREITER S , SCHMIDHUBER J . Long short-term memory [J ] . Neural Computation , 1997 , 9 ( 8 ): 1735 - 1780 .

NIEBLES J C , WANG H , LI F F . Unsupervised learning of human action categories using spatial-temporal words [J ] . International Journal of Computer Vision , 2008 , 79 ( 3 ): 299 - 318 .

DONAHUE J , HENDRICKS L A , GUADARRAMA S , et al . Long-term recurrent convolutional networks for visual recognition and description [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2015 : 2625 - 2634 .

NG Y H , HAUSKNECHT M , VIJAYANARASIMHAN S , et al . Beyond short snippets:deep networks for video classification [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2015 : 4694 - 4702 .

YU S , CHENG Y , XIE L , et al . A novel recurrent hybrid network for feature fusion in action recognition [J ] . Journal of Visual Communication ＆ Image Representation , 2017 , 49 : 192 - 203 .

GROSS R , SHI J . The CMU motion of body (MoBo) database [J ] . Monumenta Nipponica , 2001 , 45 ( 4 ).

SCHULDT C , LAPTEV I , CAPUTO B . Recognizing human actions:a local SVM approach [C ] // International Conference on Pattern Recognition . 2004 : 32 - 36 .

WEINLAND D , RONFARD R , BOYER E . Free viewpoint action recognition using motion history volumes [J ] . Computer Vision ＆ Image Understanding , 2011 , 104 ( 2 ): 249 - 257 .

RODRIGUEZ M D , AHMED J , SHAH M . Action MACH a spatio-temporal maximum average correlation height filter for action recognition [C ] // Conference on Computer Vision and Pattern Recognition . 2008 : 1 - 8 .

MARSZALEK M , LAPTEV I , SCHMID C . Actions in context [C ] // Conference on Computer Vision and Pattern Recognition . 2009 : 2929 - 2936 .

SIGAL L , BALAN A O , BLACK M J . HumanEva:synchronized video and motion capture dataset and baseline algorithm for evaluation of articulated human motion [J ] . International Journal of Computer Vision , 2006 , 87 ( 1-2 ): 4 - 27 .

LIU J , LUO J , SHAH M . Recognizing realistic actions from videos in the wild [C ] // Computer Vision and Pattern Recognition . 2009 : 1996 - 2003 .

LI W , ZHANG Z , LIU Z . Action recognition based on a bag of 3D points [C ] // Conference on Computer Vision and Pattern Recognition . 2010 : 9 - 14 .

REDDY K K , SHAH M . Recognizing 50 human action categories of web videos [J ] . Machine Vision ＆ Applications , 2013 , 24 ( 5 ): 971 - 981 .

ELLIS C , MASOOD S Z , TAPPEN M F , et al . Exploring the trade-off between accuracy and observational latency in action recognition [J ] . International Journal of Computer Vision , 2013 , 101 ( 3 ): 420 - 436 .

PENG X , ZOU C , QIAO Y , et al . Action recognition with stacked fisher vectors [C ] // European Conference on Computer Vision . 2014 : 581 - 595 .

DUTA I C , LONESCU B , AIZAWA K , et al . Spatio-temporal VLAD encoding for human action recognition in videos [C ] // International Conference on Multimedia Modeling . 2017 : 365 - 378 .

BILEN H , FERNANDO B , GAVVES E , et al . Action recognition with dynamic image networks [J ] . IEEE Transactions on Pattern Analysis ＆Machine Intelligence , 2017 , PP ( 99 ): 1 .

WU X , XU D , DUAN L , et al . Action recognition using context and appearance distribution features [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2011 : 489 - 496 .

LIU J , KUIPERS B , SAVARESE S . Recognizing human actions by attributes [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2011 : 3337 - 3344 .

CORSO J J , . Action bank:a high-level representation of activity in video [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2012 : 1234 - 1241 .

CHEN M , GONG L , WANG T , et al . Action recognition using lie Algebrized Gaussians over dense local spatio-temporal features [J ] . Multimedia Tools ＆ Applications , 2015 , 74 ( 6 ): 2127 - 2142 .

ZHANG Z , TAO D . Slow feature analysis for human action recognition [J ] . IEEE Transactions on Pattern Analysis ＆ Machine Intelligence , 2012 , 34 ( 3 ): 436 - 450 .

JI S , XU W , YANG M , et al . 3D convolutional neural networks for human action recognition [J ] . IEEE Transactions on Pattern Analysis ＆Machine Intelligence , 2013 , 35 ( 1 ): 221 - 231 .

HASAN M , ROY-CHOWDHURY A K , . Continuous learning of human activity models using deep nets [C ] // European Conference on Computer Vision . 2014 : 705 - 720 .

SUN L , JIA K , CHAN T H , et al . DL-SFA:deeply-learned slow feature analysis for action recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2014 : 2625 - 2632 .

JIANG Y G , DAI Q , XUE X , et al . Trajectory-based modeling of human actions with motion reference points [C ] // European Conference on Computer Vision . 2012 : 425 - 438 .

WANG L M , QIAO Y , TANG X . Motionlets:mid-level 3d parts for human motion recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2013 : 2674 - 2681 .

SUN L , JIA K , YEUNG D Y , et al . Human action recognition using factorized spatio-temporal convolutional networks [C ] // IEEE International Conference on Computer Vision . 2015 : 4597 - 4605 .

WANG L , QIAO Y , TANG X . Action recognition with trajectory-pooled deep-convolutional descriptors [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2015 : 4305 - 4314 .

PARK E , HAN X , BERG T L , et al . Combining multiple sources of knowledge in deep CNNs for action recognition [C ] // IEEE Winter Conference on Applications of Computer Vision . 2016 : 1 - 8 .

SOUZA C R D , GAIDON A , VIG E , et al . Sympathy for the details:dense trajectories and hybrid classification architectures for action recognition [C ] // European Conference on Computer Vision . 2016 : 697 - 716 .

YU S , CHENG Y , SU S , et al . Stratified pooling based deep convolutional neural networks for human action recognition [J ] . Multimedia Tools ＆ Applications , 2017 , 76 ( 11 ): 13367 - 13382 .

MURTHY O V R , GOECKE R . Ordered trajectories for large scale human action recognition [C ] // IEEE International Conference on Computer Vision . 2014 : 412 - 419 .

PENG X , WANG L , QIAO Y , et al . Boosting VLAD with supervised dictionary learning and high-order statistics [C ] // European Conference on Computer Vision . 2014 : 660 - 674 .

LAN Z , LIN M , LI X , et al . Beyond gaussian pyramid:multi-skip feature stacking for action recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2015 : 204 - 212 .

FEICHTENHOFER C , PINZ A , WILDES R P . Spatiotemporal multip lier networks for video action recognition [C ] // IEEE Conference on Computer Vision and Pattern Recognition . 2017 : 7445 - 7454 .

浏览量

5749

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

基于LBP和深度学习的非限制条件下人脸识别算法