GATING RECURRENT MIXTURE DENSITY NETWORKS FOR ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS

CORC > 自动化研究所 > 中国科学院自动化研究所 > 数字内容技术与服务研究中心 > 听觉模型与认知计算

	GATING RECURRENT MIXTURE DENSITY NETWORKS FOR ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS
	Wang, Wenfu; Xu, Shuang; Xu, Bo
	2016-03
会议日期	2016-3-21
会议地点	Shanghai, China
关键词	Statistical Parametric Speech Synthesis Gating Units Gru Gating Recurrent Mixture Density Network
页码	5520-5524
英文摘要	Though recurrent neural networks (RNNs) using long short-term memory (LSTM) units can address the issue of long-span dependencies across the linguistic inputs and have achieved the state-of-the-art performance for statistical parametric speech synthesis (SPSS), another limitation of the intrinsic uni-Gaussian nature of mean square error (MSE) objective function still remains. This paper proposes a gating recurrent mixture density network (GRMDN) architecture to jointly address these two problems in neural network based SPSS. What’s more, the gated recurrent unit (GRU), which is much simpler and has more intelligible work mechanism than LSTM, is also investigated as an alternative gating unit in RNN based acoustic modeling. Experimental results show that the proposed GRMDN architecture can synthesize more natural speech than its MSE-trained counterpart and both the two gating units (LSTM and GRU) show comparable performance.
语种	英语
内容类型	会议论文
源URL	[http://ir.ia.ac.cn/handle/173211/19654]
专题	数字内容技术与服务研究中心_听觉模型与认知计算
作者单位	Institute of Automation, Chinese Academy of Sciences, Beijing, China
推荐引用方式 GB/T 7714	Wang, Wenfu,Xu, Shuang,Xu, Bo. GATING RECURRENT MIXTURE DENSITY NETWORKS FOR ACOUSTIC MODELING IN STATISTICAL PARAMETRIC SPEECH SYNTHESIS[C]. 见:. Shanghai, China. 2016-3-21.