Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network
Peng, Zhichao1; Zeng, Hua1; Li, Yongwei2; Du, Yegang3; Dang, Jianwu4,5
刊名ELECTRONICS
2023-11-01
卷号12期号:22页码:15
关键词modulation-filtered cochleagram parallel attention recurrent neural network dimensional emotion recognition auditory signal processing noise-robust
DOI10.3390/electronics12224620
通讯作者Peng, Zhichao(zcpeng@tju.edu.cn) ; Dang, Jianwu(jdang@jaist.ac.jp)
英文摘要Dimensional emotion can better describe rich and fine-grained emotional states than categorical emotion. In the realm of human-robot interaction, the ability to continuously recognize dimensional emotions from speech empowers robots to capture the temporal dynamics of a speaker's emotional state and adjust their interaction strategies in real-time. In this study, we present an approach to enhance dimensional emotion recognition through modulation-filtered cochleagram and parallel attention recurrent neural network (PA-net). Firstly, the multi-resolution modulation-filtered cochleagram is derived from speech signals through auditory signal processing. Subsequently, the PA-net is employed to establish multi-temporal dependencies from diverse scales of features, enabling the tracking of the dynamic variations in dimensional emotion within auditory modulation sequences. The results obtained from experiments conducted on the RECOLA dataset demonstrate that, at the feature level, the modulation-filtered cochleagram surpasses other assessed features in its efficacy to forecast valence and arousal. Particularly noteworthy is its pronounced superiority in scenarios characterized by a high signal-to-noise ratio. At the model level, the PA-net attains the highest predictive performance for both valence and arousal, clearly outperforming alternative regression models. Furthermore, the experiments carried out on the SEWA dataset demonstrate the substantial enhancements brought about by the proposed method in valence and arousal prediction. These results collectively highlight the potency and effectiveness of our approach in advancing the field of dimensional speech emotion recognition.
资助项目Hunan Provincial Natural Science Foundation of China
WOS关键词SPEAKER INDIVIDUALITY ; TEMPORAL ENVELOPE ; VOCAL-EMOTION ; PERCEPTION ; FEATURES
WOS研究方向Computer Science ; Engineering ; Physics
语种英语
出版者MDPI
WOS记录号WOS:001118263900001
资助机构Hunan Provincial Natural Science Foundation of China
内容类型期刊论文
源URL[http://ir.ia.ac.cn/handle/173211/55066]  
专题模式识别国家重点实验室_智能交互
通讯作者Peng, Zhichao; Dang, Jianwu
作者单位1.Hunan Univ Humanities Sci & Technol, Sch Informat, Loudi 417000, Peoples R China
2.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100045, Peoples R China
3.Waseda Univ, Future Robot Org, Tokyo 1698050, Japan
4.Tianjin Univ, Coll Intelligence & Comp, Tianjin 300072, Peoples R China
5.Pengcheng Lab, Shenzhen 518066, Peoples R China
推荐引用方式
GB/T 7714
Peng, Zhichao,Zeng, Hua,Li, Yongwei,et al. Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network[J]. ELECTRONICS,2023,12(22):15.
APA Peng, Zhichao,Zeng, Hua,Li, Yongwei,Du, Yegang,&Dang, Jianwu.(2023).Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network.ELECTRONICS,12(22),15.
MLA Peng, Zhichao,et al."Enhancing Dimensional Emotion Recognition from Speech through Modulation-Filtered Cochleagram and Parallel Attention Recurrent Network".ELECTRONICS 12.22(2023):15.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace