基于特征分析建模的语音情感识别

CORC > 自动化研究所 > 中国科学院自动化研究所 > 毕业生 > 硕士学位论文

题名	基于特征分析建模的语音情感识别
作者	梁雅萌
学位类别	工程硕士
答辩日期	2016-05-28
授予单位	中国科学院大学
授予地点	北京
导师	刘文举
关键词	语音情感识别音频词袋模型深度神经网络统计方法句特征向量
中文摘要	语音是人与人交流的重要方式，语音传递的情感信息能够更加深刻、生动地表明说话人的心情，并对语言内容起到强调作用，使双方交流更加自然。当前，语音情感识别已经成为人工智能领域的一个重要研究课题。人类之所以能捕捉声音中的情感变化，是因为人脑具备着感知与理解语音信号中反应说话人情感状态信息（如语调的变化）的能力。语音情感识别是对人类情感感知过程的模拟，在语音情感识别研究中，首先需要提取音频信号的声学特征，然后从声学特征中获得描述声音情感特性的关键特征，并得到这些关键特征与情感类型的关系模型。如何从信息丰富且混杂的语音中获得表达情感信息的关键特征，是本文研究的重要内容。本文在认真总结前人研究成果的基础上，从特征分析入手，研究更加合理的语音情感识别策略并开展了一系列研究。本文的主要研究内容和贡献如下：第一，提出了基于加强音频词袋模型的语音情感识别方法。该方法基于量化统计过程，把语音文件的所有帧特征向量转化成句特征向量，并且在量化统计过程中提出了“多码词”统计思想，使句特征向量包含丰富的情感信息，减少了传统音频词袋模型对有效信息的丢失，对情感识别任务有很高的区分性，并取得了良好的识别结果。第二，提出了基于深度神经网络及全局统计的语音情感识别方法。该方法首先分别采用迁移学习和编码器方法用于深度神经网络的初始化工作，获取语音文件的概率分布矩阵；然后采用矩阵数据筛选来增强数据的区分性；最后采用全局统计从概率分布矩阵中获取包含丰富情感信息的句特征向量。经实验验证，该方法取得了良好的识别结果。第三，本文针对工程项目背景及要求，基于MFC平台及加强音频词袋模型开发了一款离线语音情感识别软件。该软件主要包括四个模块，分别是识别模型的训练模块、情感识别模块、界面操作模块及多线程设计模块。
英文摘要	Speech is an important way of communication between people. The emotional information in speech makes conversation more fluently, because it helps understanding the emotional state of speaker, and emphasizing the speech content. Nowadays, speech emotion recognition (SER) has been an important research topic in artificial intelligence. People understand the speech emotion because that the brain has the ability of perception of emotional information. SER is the simulation of the process of perception. In SER, acoustic features are firstly extracted from audio signals, then we extract the key features which are related to emotions from acoustic features, and finally we model the relationship between the key features and emotions. How to extract the key features from audio signals is the main topic in this dissertation. Based on previous researches, we focus on feature analysis, and study reasonable recognition strategies. The main contents can be concluded as follows: Firstly, recognition model which is based on the enhanced bag-of-audio-words (BoAW) model is proposed. The enhanced BoAW model can extract utterance-level feature vectors of the input audio files using vector quantization, where the ''Multi-codeword'' idea helps the utterance-level feature vectors contain sufficient emotional information. The utterance-level feature vectors are more discriminant for classification, and get a great recognition result. Secondly, recognition model which is based on deep neural network (DNN) and statistical methods is explored. In our research, transfer learning method and auto-encoder are respectively used for initialization of DNN. Then, data filtering is used to deal with the probability distribution matrix, and finally statistical methods are used to extract the utterance-level feature vectors. The method achieves a great recognition result. Thirdly, an off-line SER software is developed based on MFC platform. The software consists of four modules, which are training module, recognition module, interface module and multi-thread module.
语种	中文
内容类型	学位论文
源URL	[http://ir.ia.ac.cn/handle/173211/11673]
专题	毕业生_硕士学位论文
作者单位	中国科学院自动化研究所
推荐引用方式 GB/T 7714	梁雅萌. 基于特征分析建模的语音情感识别[D]. 北京. 中国科学院大学. 2016.