COMBINING UNIDIRECTIONAL LONG SHORT-TERM MEMORY WITH CONVOLUTIONAL OUTPUT LAYER FOR HIGH-PERFORMANCE SPEECH SYNTHESIS

CORC > 自动化研究所 > 中国科学院自动化研究所 > 数字内容技术与服务研究中心 > 听觉模型与认知计算

	COMBINING UNIDIRECTIONAL LONG SHORT-TERM MEMORY WITH CONVOLUTIONAL OUTPUT LAYER FOR HIGH-PERFORMANCE SPEECH SYNTHESIS
	Wang, Wenfu; Xu, Bo
	2017-03
会议日期	2017-3-5
会议地点	New Orleans, USA
关键词	Statistical Parametric Speech Synthesis Lstm Convolutional Output Layer High-performance Trajectory Smoother
页码	5500-5504
英文摘要	In this paper, we target improving the accuracy of acoustic modelling for statistical parametric speech synthesis (SPSS) and introduce the convolutional neural network (CNN) due to its powerful capacity in locality modelling. A novel model architecture combining unidirectional long short-term memory (LSTM) and a time-domain convolutional output layer (COL) is proposed and employed to acoustic modelling. The two components complement each other and result in a high-performance synthesis system. Specifically, the unidirectional LSTM can learn expressive feature representations from history context and the COL ingeniously absorbs some of these representations within a look-ahead window to advance predictions. This complementary mechanism significantly improve the predictive accuracy and the quality of synthetic speech. In addition, the unique operation mechanism of convolution makes COL a fine parameter trajectory smoother between consecutive frames. Subjective preference tests show that the proposed architecture can synthesize natural sounding speech without dynamic features.
语种	英语
内容类型	会议论文
源URL	[http://ir.ia.ac.cn/handle/173211/19660]
专题	数字内容技术与服务研究中心_听觉模型与认知计算
作者单位	Institute of Automation, Chinese Academy of Sciences, Beijing, China
推荐引用方式 GB/T 7714	Wang, Wenfu,Xu, Bo. COMBINING UNIDIRECTIONAL LONG SHORT-TERM MEMORY WITH CONVOLUTIONAL OUTPUT LAYER FOR HIGH-PERFORMANCE SPEECH SYNTHESIS[C]. 见:. New Orleans, USA. 2017-3-5.