COMBINING UNIDIRECTIONAL LONG SHORT-TERM MEMORY WITH CONVOLUTIONAL OUTPUT LAYER FOR HIGH-PERFORMANCE SPEECH SYNTHESIS
Wang, Wenfu; Xu, Bo
2017-03
会议日期2017-3-5
会议地点New Orleans, USA
关键词Statistical Parametric Speech Synthesis Lstm Convolutional Output Layer High-performance Trajectory Smoother
页码5500-5504
英文摘要In this paper, we target improving the accuracy of acoustic modelling for statistical parametric speech synthesis (SPSS) and introduce the convolutional neural network (CNN) due to its powerful capacity in locality modelling. A novel model architecture combining unidirectional long short-term memory (LSTM) and a time-domain convolutional output layer (COL) is proposed and employed to acoustic modelling. The two components complement each other and result in a high-performance synthesis system. Specifically, the unidirectional LSTM can learn expressive feature representations from history context and the COL ingeniously absorbs some of these representations within a look-ahead window to advance predictions. This complementary mechanism significantly improve the predictive accuracy and the quality of synthetic speech. In addition, the unique operation mechanism of convolution makes COL a fine parameter trajectory smoother between consecutive frames. Subjective preference tests show that the proposed architecture can synthesize natural sounding speech without dynamic features.
语种英语
内容类型会议论文
源URL[http://ir.ia.ac.cn/handle/173211/19660]  
专题数字内容技术与服务研究中心_听觉模型与认知计算
作者单位Institute of Automation, Chinese Academy of Sciences, Beijing, China
推荐引用方式
GB/T 7714
Wang, Wenfu,Xu, Bo. COMBINING UNIDIRECTIONAL LONG SHORT-TERM MEMORY WITH CONVOLUTIONAL OUTPUT LAYER FOR HIGH-PERFORMANCE SPEECH SYNTHESIS[C]. 见:. New Orleans, USA. 2017-3-5.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace