A Reconstruction-based Visual-Acoustic-Semantic Embedding Method for Speech-Image Retrieval | |
Cheng, Wenlong2,3; Tang, Wei2,3; Huang, Yan2,3; Luo, Yiwen1; Wang, Liang2,3,4 | |
刊名 | IEEE Transactions on Multimedia |
2022 | |
页码 | 14 |
产权排序 | 1 |
文献子类 | 国际期刊 |
英文摘要 | Speech-image retrieval aims at learning the relevance between image and speech. Prior approaches are mainly based on bi-modal contrastive learning, which can not alleviate the cross-modal heterogeneous issue between visual and acoustic modalities well. To address this issue, we propose a visual-acoustic-semantic embedding (VASE) method. First, we propose a tri-modal ranking loss by taking advantage of semantic information corresponding to the acoustic data, which introduces the auxiliary alignment to enhance the alignment between image and speech. Second, we introduce a cycle-consistency loss based on feature reconstruction. It can further alleviate the heterogeneous issue between different data modalities (e.g., visual-acoustic, visual-textual and acoustic-textual). Extensive experiments have demonstrated the effectiveness of our proposed method. In addition, our VASE model achieves state-of-the-art performance on the speech-image retrieval task on the Flickr8K and Places datasets. |
语种 | 英语 |
内容类型 | 期刊论文 |
源URL | [http://ir.ia.ac.cn/handle/173211/48532] |
专题 | 自动化研究所_智能感知与计算研究中心 |
通讯作者 | Wang, Liang |
作者单位 | 1.西安交通大学,人工智能与机器人研究所 2.中国科学院自动化研究所,智能感知与计算研究中心 3.中国科学院大学 4.中国科学院脑科学与智能技术卓越创新中心 |
推荐引用方式 GB/T 7714 | Cheng, Wenlong,Tang, Wei,Huang, Yan,et al. A Reconstruction-based Visual-Acoustic-Semantic Embedding Method for Speech-Image Retrieval[J]. IEEE Transactions on Multimedia,2022:14. |
APA | Cheng, Wenlong,Tang, Wei,Huang, Yan,Luo, Yiwen,&Wang, Liang.(2022).A Reconstruction-based Visual-Acoustic-Semantic Embedding Method for Speech-Image Retrieval.IEEE Transactions on Multimedia,14. |
MLA | Cheng, Wenlong,et al."A Reconstruction-based Visual-Acoustic-Semantic Embedding Method for Speech-Image Retrieval".IEEE Transactions on Multimedia (2022):14. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论