A Reconstruction-based Visual-Acoustic-Semantic Embedding Method for Speech-Image Retrieval
Cheng, Wenlong2,3; Tang, Wei2,3; Huang, Yan2,3; Luo, Yiwen1; Wang, Liang2,3,4
刊名IEEE Transactions on Multimedia
2022
页码14
产权排序1
文献子类国际期刊
英文摘要

Speech-image retrieval aims at learning the relevance between image and speech. Prior approaches are mainly based on bi-modal contrastive learning, which can not alleviate the cross-modal heterogeneous issue between visual and acoustic modalities well. To address this issue, we propose a visual-acoustic-semantic embedding (VASE) method. First, we propose a tri-modal ranking loss by taking advantage of semantic information corresponding to the acoustic data, which introduces the auxiliary alignment to enhance the alignment between image and speech. Second, we introduce a cycle-consistency loss based on feature reconstruction. It can further alleviate the heterogeneous issue between different data modalities (e.g., visual-acoustic, visual-textual and acoustic-textual). Extensive experiments have demonstrated the effectiveness of our proposed method. In addition, our VASE model achieves state-of-the-art performance on the speech-image retrieval task on the Flickr8K and Places datasets.

语种英语
内容类型期刊论文
源URL[http://ir.ia.ac.cn/handle/173211/48532]  
专题自动化研究所_智能感知与计算研究中心
通讯作者Wang, Liang
作者单位1.西安交通大学,人工智能与机器人研究所
2.中国科学院自动化研究所,智能感知与计算研究中心
3.中国科学院大学
4.中国科学院脑科学与智能技术卓越创新中心
推荐引用方式
GB/T 7714
Cheng, Wenlong,Tang, Wei,Huang, Yan,et al. A Reconstruction-based Visual-Acoustic-Semantic Embedding Method for Speech-Image Retrieval[J]. IEEE Transactions on Multimedia,2022:14.
APA Cheng, Wenlong,Tang, Wei,Huang, Yan,Luo, Yiwen,&Wang, Liang.(2022).A Reconstruction-based Visual-Acoustic-Semantic Embedding Method for Speech-Image Retrieval.IEEE Transactions on Multimedia,14.
MLA Cheng, Wenlong,et al."A Reconstruction-based Visual-Acoustic-Semantic Embedding Method for Speech-Image Retrieval".IEEE Transactions on Multimedia (2022):14.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace