The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network based Speech Synthesis | |
Wen ZQ(温正棋)1; Li Y(李雅)1; Tao JH(陶建华)1,2; Wen, Zhengqi | |
2016-09 | |
会议日期 | Sep 8-12, 2016 |
会议地点 | San Francisco,USA |
关键词 | Phoneme Embedded Vector Word Embedding Speech Synthesis Blstm-rnn |
英文摘要 | In the speech synthesis systems, the phoneme identity feature indicated as the pronunciation unit is influenced by external contexts like the neighboring words and phonemes. This paper proposes to encode such relatedness and parameterize the pronunciation of the phoneme identity feature as a continuous real-valued vector. The vector, composed by a phoneme embedded vector (PEV) and a word embedded vector (WEV), is applied to substitute the binary vector whose representation is one-hot. It is realized in the word embedding model with the joint training structure where the PEV and WEV are learned together. The effectiveness of the proposed technique was evaluated by comparing it with the binary vector in the bidirectional long short term memory recurrent neural network (BLSTM-RNN) based speech synthesis systems. Improvement on the quality of the synthesized speech has been achieved from the proposed system, which proves the effectiveness of replacing the binary vector with the continuous real-valued vector in describing the phoneme identity feature. |
会议录 | INTERSPEECH |
内容类型 | 会议论文 |
源URL | [http://ir.ia.ac.cn/handle/173211/12476] |
专题 | 自动化研究所_模式识别国家重点实验室_人机语音交互团队 |
通讯作者 | Wen, Zhengqi |
作者单位 | 1.National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 2.CAS Center for Excellence in Brain Science and Intelligence Technology |
推荐引用方式 GB/T 7714 | Wen ZQ,Li Y,Tao JH,et al. The Parameterized Phoneme Identity Feature as a Continuous Real-Valued Vector for Neural Network based Speech Synthesis[C]. 见:. San Francisco,USA. Sep 8-12, 2016. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论