CMCGAN A Uniform Framework for CrossModal VisualAudio Mutual Generation
Wangli Hao1,2; Zhaoxiang Zhang1,2,3; He Guan1
2018
会议日期2018.2.1
会议地点Hilton New Orleans Riverside, American
英文摘要

Visual and audio modalities are two symbiotic modalities underlying videos, which contain both common and complementary information. If they can be mined and fused sufficiently, performances of related video tasks can be significantly enhanced. However, due to the environmental interference or sensor fault, sometimes, only one modality exists while the other is abandoned or missing. By recovering the
missing modality from the existing one based on the common information shared between them and the prior information of the specific modality, great bonus will be gained for various vision tasks. In this paper, we propose a Cross-Modal Cycle Generative Adversarial Network (CMCGAN) to handle cross-modal visual-audio mutual generation. Specifically, CMCGAN is composed of four kinds of subnetworks:
audio-to-visual, visual-to-audio, audio-to-audio and visualto-visual subnetworks respectively, which are organized in a cycle architecture. CMCGAN has several remarkable advantages. Firstly, CMCGAN unifies visual-audio mutual generation into a common framework by a joint corresponding adversarial loss. Secondly, through introducing a latent vector with Gaussian distribution, CMCGAN can handle dimension and structure asymmetry over visual and audio modalities effectively. Thirdly, CMCGAN can be trained end-to-end to achieve better convenience. Benefiting from CMCGAN, we develop a dynamic multimodal classification network to handle the modality missing problem. Abundant experiments have been conducted and validate that CMCGAN obtains the state-of-the-art cross-modal visual-audio generation results. Furthermore, it is shown that the generated modality achieves comparable effects with those of original modality, which demonstrates the effectiveness and advantages of our proposed method.

 

语种英语
内容类型会议论文
源URL[http://ir.ia.ac.cn/handle/173211/23880]  
专题自动化研究所_模式识别国家重点实验室
自动化研究所_智能感知与计算研究中心
通讯作者Zhaoxiang Zhang
作者单位1.Center of Research on Intelligent Perception and Computing
2.Institute of Automation, University of Chinese Academy of Sciences
3.Center for Excellence in Brain Science and Intelligence Technology (CEBSIT)
4.CAS Center for Excellence in Brain Science and Intelligence
推荐引用方式
GB/T 7714
Wangli Hao,Zhaoxiang Zhang,He Guan. CMCGAN A Uniform Framework for CrossModal VisualAudio Mutual Generation[C]. 见:. Hilton New Orleans Riverside, American. 2018.2.1.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace