基于视频的人体行为识别研究

CORC > 沈阳自动化研究所 > 中国科学院沈阳自动化研究所 > 工业控制网络与系统研究室

题名	基于视频的人体行为识别研究
作者	王学微; 郑萌; 梁炜; 夏小芳
学位类别	硕士
答辩日期	2018-05-17
授予单位	中国科学院沈阳自动化研究所
授予地点	沈阳
导师	徐方
关键词	行为识别卷积神经网络递归神经网络骨骼结点录制异常行为数据集
其他题名	Research on Human Behavior Recognition Based on Video
学位专业	控制理论与控制工程
中文摘要	本文首先对人体行为识别发展现状进行了概述，阐明其研究的难点以及未来的趋势；然后，在第二章对深度学习的基本理论进行了研究学习；在前两部分工作基础上，本文针对不同的问题开展了具体的工作，详细情况如下：（1）设计一种结合递归神经网络与深度卷积神经网络的端到端的深度学习模型。该模型首先利用深度卷积神经网络提取视频中单帧图片上的视觉特征，然后将提取到的视觉特征作为递归神经网络的输入，从而获取帧与帧间的时序信息。最终所提算法在公开的UCF101人体行为数据集上取得85.68%的准确率。（2）实现基于三维卷积核的双通道模型。该模型使用空间流通道获取视频序列中每一帧图像的信息；使用时间流通道获取帧与帧间的时序信息。两个通道均是由三维卷积核构成的深度卷积神经网络，结构完全相同，但分别使用RGB图像和光流图像作为输入。最后，根据沈阳新松机器人自动化股份有限公司实际需求，以服务机器人的视角录制了人体行为数据集。该数据集包含打架、摔倒、扔、踢、捡起、站立六类行为，包含2592个样本。所提模型在该数据集上取得95.1%的识别准确率。（3）利用人体骨骼结点构建行为特征，实现实时的人体行为识别。首先将Kinect提取的三维人体关节坐标数据进行预处理，用二维矩阵表达一段视频中所有人的关节坐标；然后，设计带有残差连接单元的一维卷积神经网络，一维卷积核在卷积过程中仅在时间维度上进行滑动，从而逐渐提取时间维度和空间维度的信息。最后，在目前最大的三维人体行为数据集NTU-RGBD上进行训练与测试，与同类型的Res-TCN模型对比，在几乎不影响准确率的前提下，将模型参数量减小了47%，极大降低了模型的复杂度，提高了模型效率。
英文摘要	This article first gives an overview of the status of the development of activity recognition, clarify the difficulties of its research and future trends; Then, in Chapter 2, we studied the basic theory of deep learning. Based on the work of the first two parts, this article has carried out specific work on different issues in the field of activity recognition. Details are as follows: (1) An end-to-end deep learning model combining recurrent neural networks with deep convolution neural networks is designed. The model first uses the deep convolution neural network to extract the visual features of the single frame picture in the video, and then takes the extracted visual feature as the input of the recursive neural network to obtain the temporal information between frames and frames. Finally, the proposed algorithm obtains a 85.68% accuracy on the public UCF101 Human Behavior DataSet. (2) A two-stream model based on three-dimensional convolution kernel is implemented. The model uses the spatial stream convnet to obtain the information of each frame image in the video sequence, and uses the temporal stream convnet to obtain the temporal information between frame and frame. Two streams are a deep convolution neural network composed of three-dimensional convolution kernel, which is identical in structure, but uses RGB image and optical flow image as input respectively. Finally, according to the actual demand of ShenYang Xinsong Robot Automation Co., a human activity dataset is recorded from the view of service robot. The dataset contains six types of behaviors, including fighting, falling, throwing, kicking, picking up and standing, containing 2,592 samples. The proposed model obtains a 95.1% recognition accuracy rate on the dataset. (3) Using human skeleton node to construct behavioral features and realize real-time human behavior recognition. Firstly, the three-dimensional human joint coordinate data of Kinect were preprocessed, and the joint coordinates of all people were expressed by two-dimensional matrix. Then, a one-dimensional convolution neural network with a residual connection element is designed, and the one-dimensional convolution kernel slides only on the time dimension in the convolution process, thus gradually extracting the information of time dimension and spatial dimension. Finally, training and testing on the largest three-dimensional human behavior DataSet NTU-RGBD. Compared with the same type of Res-TCN model, the model parameters are reduced by 47%, which greatly reduces the complexity of the model and improves the efficiency of the model, without affecting the accuracy rate.
语种	中文
产权排序	1
页码	88页
内容类型	学位论文
源URL	[http://ir.sia.cn/handle/173321/22049]
专题	沈阳自动化研究所_工业控制网络与系统研究室
推荐引用方式 GB/T 7714	王学微,郑萌,梁炜,等. 基于视频的人体行为识别研究[D]. 沈阳. 中国科学院沈阳自动化研究所. 2018.