COMBINING UNIDIRECTIONAL LONG SHORT-TERM MEMORY WITH CONVOLUTIONAL OUTPUT LAYER FOR HIGH-PERFORMANCE SPEECH SYNTHESIS | |
Wang, Wenfu; Xu, Bo | |
2017-03 | |
会议日期 | 2017-3-5 |
会议地点 | New Orleans, USA |
关键词 | Statistical Parametric Speech Synthesis Lstm Convolutional Output Layer High-performance Trajectory Smoother |
页码 | 5500-5504 |
英文摘要 | In this paper, we target improving the accuracy of acoustic modelling for statistical parametric speech synthesis (SPSS) and introduce the convolutional neural network (CNN) due to its powerful capacity in locality modelling. A novel model architecture combining unidirectional long short-term memory (LSTM) and a time-domain convolutional output layer (COL) is proposed and employed to acoustic modelling. The two components complement each other and result in a high-performance synthesis system. Specifically, the unidirectional LSTM can learn expressive feature representations from history context and the COL ingeniously absorbs some of these representations within a look-ahead window to advance predictions. This complementary mechanism significantly improve the predictive accuracy and the quality of synthetic speech. In addition, the unique operation mechanism of convolution makes COL a fine parameter trajectory smoother between consecutive frames. Subjective preference tests show that the proposed architecture can synthesize natural sounding speech without dynamic features. |
语种 | 英语 |
内容类型 | 会议论文 |
源URL | [http://ir.ia.ac.cn/handle/173211/19660] |
专题 | 数字内容技术与服务研究中心_听觉模型与认知计算 |
作者单位 | Institute of Automation, Chinese Academy of Sciences, Beijing, China |
推荐引用方式 GB/T 7714 | Wang, Wenfu,Xu, Bo. COMBINING UNIDIRECTIONAL LONG SHORT-TERM MEMORY WITH CONVOLUTIONAL OUTPUT LAYER FOR HIGH-PERFORMANCE SPEECH SYNTHESIS[C]. 见:. New Orleans, USA. 2017-3-5. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论