流形学习方法及其在模式分类中的应用研究

CORC > 自动化研究所 > 中国科学院自动化研究所 > 毕业生 > 博士学位论文

题名	流形学习方法及其在模式分类中的应用研究
作者	仲国强
学位类别	工学博士
答辩日期	2011-05-31
授予单位	中国科学院研究生院
授予地点	中国科学院自动化研究所
导师	刘成林
关键词	流形学习模式分类潜变量模型特征提取距离测度学习半监督学习 Manifold learning pattern classification latent variable models feature extraction distance metric learning semi-supervised learning
其他题名	Research on Manifold Learning and Its Applications in Pattern Classification
学位专业	模式识别与智能系统
中文摘要	随着信息技术的发展，如何获取数据已不是一件困难的事情，而如何有效处理数据已逐渐成为一个亟待解决的问题。海量、高维、非结构化数据成为当前信息处理的难点，如何从中提取简约而有效的知识，是科研人员和工程技术人员面临的共同挑战之一，也是机器学习和数据挖掘研究的主要内容。为了对此问题进行探索研究，本文以流形学习为主题，从无监督流形学习、潜变量模型、特征提取、基于流形正则化的低秩测度学习等角度，设计切实可行的机器学习算法，并将这些算法应用于解决实际的模式分类问题。大量的实验结果证明了本文提出方法的可行性和有效性。本文的主要贡献有以下几点：提出并证明了数据流形上的局部切空间定理，基于这个定理提出了基于局部切空间的拉普拉斯特征映射（LTSLE）算法。LTSLE是一个无监督的流形学习算法，它通过在局部切空间中计算数据点之间的欧式距离来刻画数据点在观测空间中的相似性，并利用流形上的Laplace-Beltrami算子与图Laplacian之间的对应关系，得到高维数据的低维嵌入。LTSLE算法保持了Belkin和Niyogi的拉普拉斯特征映射（LE）算法的许多优点，同时克服了LE算法在热核函数的参数 t 赋值不合适的情况下算法会失败的不足。为了解决如何将新的样本有效地投影到低维空间的问题，本文还给出了LTSLE算法的线性算法，LLTSLE。可视化和手写数字识别的实验结果证明了本算法的可行性。提出了高斯过程潜随机场模型（GPLRF），它是高斯过程潜变量模型（GPLVM）的一个监督的扩展模型。GPLRF本质上是一个概率图模型，它假设潜变量关于由监督信息构建的图是一个高斯马尔科夫随机场，并利用高斯过程映射将潜变量与观测变量联系起来。相对于判别的高斯过程潜变量模型（DGPLVM），GPLRF在实际应用中具有更强的灵活性，潜变量空间维数不受类别数限制。在多个数据集上的实验结果表明，当数据内在维数不高于 C-1 （C为类别数）时，GPLRF与其他性能较好的算法表现相当，而当数据内在维数高于 C-1 时，GPLRF算法表现优于DGPLVM和其他一些算法。提出了一种基于纠错输出编码（ECOC）框架的特征提取方法，其主要思想是将依据纠错输出编码矩阵训练的基分类器的概率输出作为新的特征，然后在新的特征空间训练一个元分类器（meta-learner）实现重编码和之后的解码。不同于传统的ECOC方法为每类仅赋予一个编码的方式，本文提出的方法可以通过元分类器的重编码为每类赋予多个编码，从而提高算法的泛化能力。在多个数据集上的实验结果表明，本文提出的模型在分类精度上与传统的ECOC方法和特征提取方法相比有明显的优势，而且在分类精度相当的情况下，本方法比目前较好的ECOC方法解码速度快很多。提出了一种基于流形正则化的学习低秩马氏距离函数的半监督方法。基于对从一点到流形的投影距离的近似，提出了一个新的参数化的流形正则化方法。不同于以前的一般仅仅利用附加信息的学习方法，本文提出的方法可以进一步利用数据当中的内在流形信息，直接学习一个低秩的距离测度函数，这是传统的基于L_1范数正则化方法无法做到的。最后得到的学习...
英文摘要	As the development of information technology and its wide application, to acquire large-volume data is no longer a problem. However, how to efficiently process the large-volume data is becoming a challenge. Particularly, how to process tremendous, high-dimensional and non-structural data, and to extract compact knowledge and structure from data, is a common challenge to scientists and engineers, and it at the core of machine learning and data miming research. To touch this challenge, this thesis studies into manifold learning and its applications in pattern classification. We propose in the thesis some novel learning algorithms, for unsupervised manifold learning, latent variable models, feature extraction and manifold regularization based low rank metric learning, respectively. Extensive experimental results on real-world applications demonstrate the effectiveness and efficiency of our methods. The main contributions of the thesis are as follows. We propose and prove the local tangent space theorem with respect to data lying on a low dimensional manifold, and based on this theorem, we propose a novel unsupervised manifold learning algorithm, called local tangent space Laplacian eigenmaps (LTSLE). LTSLE estimates the similarity between any pair of nearby points using the Euclidean distance calculated in the local tangent space. Based on the correspondence between the Laplace-Beltrami operator on a manifold and the graph Laplacian, it can discover the low dimensional manifold structure of the data. While LTSLE shares many virtues with the Laplacian Eigenmaps (LE) of Belkin and Niyogi, it overcomes a critical problem of LE that when the improper selection of the heat kernel parameter t causes the failure of learning. For efficiently projecting unseen data onto low dimensional space, we also give a linear version of LTSLE, LLTSLE. Experimental results on visualization and handwritten digits recognition problems demonstrate the effectiveness of our methods. We propose a novel supervised extension of Gaussian process latent variable model (GPLVM), called Gaussian process latent random field (GPLRF). GPLRF is essentially a probabilistic graphical model in that it enforces the latent variables to be a Gaussian Markov random field with respect to a graph constructed from the supervisory information and it connects the latent variables and the observed data using a nonlinear Gaussian process mapping. Compared to another supervised extension of GP...
语种	中文
其他标识符	200718014628083
内容类型	学位论文
源URL	[http://ir.ia.ac.cn/handle/173211/6380]
专题	毕业生_博士学位论文
推荐引用方式 GB/T 7714	仲国强. 流形学习方法及其在模式分类中的应用研究[D]. 中国科学院自动化研究所. 中国科学院研究生院. 2011.

个性服务

查看访问统计

相关权益政策

暂无数据

收藏/分享

所有评论 (0)

[发表评论/异议/意见]

暂无评论

评论
权益异议
反馈意见

评注功能仅针对注册用户开放，请您登录

您对该条目有什么异议，请向管理员反馈。
内容：
Email：	*
单位:
验证码：	刷新

您在知识库使用过程中有什么好的想法或者建议可以反馈给我们。
标题：	*
内容：
Email：	*
验证码：	刷新

相关链接

CORC

联系我们