基于结构化表示的视觉人体行为识别

CORC > 自动化研究所 > 中国科学院自动化研究所 > 毕业生 > 博士学位论文

题名	基于结构化表示的视觉人体行为识别
作者	吴保鑫
学位类别	工学博士
答辩日期	2015-05-28
授予单位	中国科学院大学
授予地点	中国科学院自动化研究所
导师	胡卫明
关键词	人体行为识别热核结构化描述子随机游走图核树状模式图匹配核泛化多核学习运动显著性区域特定类方向属性 Human action recognition heat kernel structural descriptors random walk graph kernels tree-patterns graph matching kernels generalized multiple kernel learning motion salient regions class-specific oriented attributes
其他题名	Human Action Recognition Based on Structural Representation
学位专业	模式识别与智能系统
中文摘要	随着多媒体与互联网技术的快速发展，视频数据的爆炸式增长和内容的多样化给分析和处理视频数据带来了新的挑战。视频中人体行为识别是视频智能分析的一个重要内容，其目的是让计算机能够自动地从未知的视频序列中识别出感兴趣的人体行为。它不仅是计算机视觉和模式识别领域的一个研究热点，而且具有广泛的应用背景，比如视频智能监控、病人监护、人机交互等。人体行为识别过程包括两个步骤：人体行为的表示和基于该表示的行为分类。近年来，基于局部时空特征的方法在人体行为识别领域中占主导地位。这类方法的本质都是对局部时空特征进行零阶或高阶的信息统计，但是忽略了局部时空特征之间的结构化信息。而这些结构化信息在人体行为识别中同样起着非常关键的作用。因此，本文以人体行为的结构化表示作为研究课题，在局部时空特征的构建、局部时空特征集合的结构化建模以及基于结构化表示的人体行为分类等方面展开了深入的研究工作。论文的主要工作和贡献如下：提出了一种新的局部时空特征描述子：热核结构化描述子。我们以热方程为理论指导，在局部时空区域内部模拟离散的热量传播过程，并根据不同时刻下时空区域内点与点之间的热量交换情况来构建热核结构化描述子。相比于传统的基于梯度或光流的3D描述子，比如3DSIFT，3DHoG，HoG/HoF等，这种结构化描述子能够从不同的尺度对时空区域所固有的内在几何结构进行描述。多组对比实验结果表明，该描述子具有较强的描述性和判别力，能够提高人体行为识别的准确率。提出了一种基于图表示的人体行为识别算法。我们构建了一个双图模型来对人体行为进行表示，该模型以局部时空特征作为图的节点，以局部时空特征之间的相互时空关系作为图的边缘。同时，我们设计了一组基于上下文信息的随机游走图核来度量图与图之间的相似性。该图核可以看作一个桥梁，将结构化的图表示与传统的统计学习方法连接在一起。最后，我们提出了一种带L12正则项的泛化多核学习算法将不同步长对应的图核进行有效融合。提出一种树状模式图匹配核，来进行两个图之间的相似性度量，并将其应用于人体行为识别。该图匹配核是基于两种新的树状模式：入射树状模式和出射树状模式构建的，它能够很好地体现图的局部拓扑结构特性。具体来说，我们提出一种动态规划算法来递归地计算树状模式之间的相似性，并设计了一种带稀疏项的二次能量函数来选择最为匹配的树状模式。我们将这种图匹配核应用到人体行为识别中，用来度量基于图表示的人体行为之间的相似性。提出了一种基于多方向运动分析的人体行为识别算法，该算法将人体行为在多个运动方向上进行分解。我们首先用一系列的3D Gabor滤波器来检测特定运动方向上的显著性区域，并提出一种底层区域描述子来对得到的显著性区域进行表示。基于这些描述子，我们接下来学习行为类别的一系列方向属性，这些方向属性反映了某一行为类别在特定运动方向上的性质。通过将人体行为投影到行为类别的方向属性空间，得到一种人体行为紧凑的中层表示。最后我们根据这种中层表示来进行人体行为的分类。
英文摘要	With the rapid development of internet and multimedia technology, both the amount and the diversity of video data are increasing dramatically. This brings in new challenges for processing and understanding these video data. Human action recognition is an essential issue of video intelligent processing, which aims at recognizing and analyzing human actions in videos by computers directly. It is one of the most popular topics in the computer vision and pattern recognition domain, with a wide spectrum of promising applications in various areas such as smart surveillance, patient care, and human-computer interaction. The process of human action recognition mainly includes two stages: human action representation and human action classification based on the specific representation. In recent years, methods based on local spatio-temporal features are dominant in the field of human action recognition. These methods are always based on the zero-order or higher-order statistics of local features. However, they usually abandon the structural information among local features, which plays a significant role in recognizing human actions. So in this thesis, we study the structural representation of human actions, involving the construction of local spatio-temporal features, the structural modeling of local feature sets, and the classification of human actions based on such structural representation. The main contributions of our work are summarized as follows: We propose a new local spatio-temporal feature descriptor, named as heat kernel structural descriptor (HKSD). We utilize the heat equation and simulate the discrete heat diffusion process in the local salient regions. We construct the HKSD according to the heat exchanges of each point in the region under different moments. Different from traditional descriptors such as 3DSIFT, 3DHoG and HoG/HoF, which are based on gradients or optical flows, HKSD is able to capture the intrinsic structural information of local regions at different scales. Experimental results show that HKSD has sufficient representative and discriminative power to improve the performance of human action recognition. We propose a new approach for human action recognition based on the graph representation of human actions. We construct a bi-graph model for human action representation. In this model, nodes of graphs represent local features and edges describe the spatio-temporal relationships among local features. We also design a family of con...
语种	中文
其他标识符	201218014628067
内容类型	学位论文
源URL	[http://ir.ia.ac.cn/handle/173211/6722]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	吴保鑫. 基于结构化表示的视觉人体行为识别[D]. 中国科学院自动化研究所. 中国科学院大学. 2015.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们