CMCGAN A Uniform Framework for CrossModal VisualAudio Mutual Generation

CORC > 自动化研究所 > 中国科学院自动化研究所 > 模式识别国家重点实验室

	CMCGAN A Uniform Framework for CrossModal VisualAudio Mutual Generation
	Wangli Hao1,2 ; Zhaoxiang Zhang1,2,3 ; He Guan1
	2018
会议日期	2018.2.1
会议地点	Hilton New Orleans Riverside, American
英文摘要	Visual and audio modalities are two symbiotic modalities underlying videos, which contain both common and complementary information. If they can be mined and fused sufficiently, performances of related video tasks can be significantly enhanced. However, due to the environmental interference or sensor fault, sometimes, only one modality exists while the other is abandoned or missing. By recovering the missing modality from the existing one based on the common information shared between them and the prior information of the specific modality, great bonus will be gained for various vision tasks. In this paper, we propose a Cross-Modal Cycle Generative Adversarial Network (CMCGAN) to handle cross-modal visual-audio mutual generation. Specifically, CMCGAN is composed of four kinds of subnetworks: audio-to-visual, visual-to-audio, audio-to-audio and visualto-visual subnetworks respectively, which are organized in a cycle architecture. CMCGAN has several remarkable advantages. Firstly, CMCGAN unifies visual-audio mutual generation into a common framework by a joint corresponding adversarial loss. Secondly, through introducing a latent vector with Gaussian distribution, CMCGAN can handle dimension and structure asymmetry over visual and audio modalities effectively. Thirdly, CMCGAN can be trained end-to-end to achieve better convenience. Benefiting from CMCGAN, we develop a dynamic multimodal classification network to handle the modality missing problem. Abundant experiments have been conducted and validate that CMCGAN obtains the state-of-the-art cross-modal visual-audio generation results. Furthermore, it is shown that the generated modality achieves comparable effects with those of original modality, which demonstrates the effectiveness and advantages of our proposed method.
语种	英语
内容类型	会议论文
源URL	[http://ir.ia.ac.cn/handle/173211/23880]
专题	自动化研究所_模式识别国家重点实验室自动化研究所_智能感知与计算研究中心
通讯作者	Zhaoxiang Zhang
作者单位	1.Center of Research on Intelligent Perception and Computing 2.Institute of Automation, University of Chinese Academy of Sciences 3.Center for Excellence in Brain Science and Intelligence Technology (CEBSIT) 4.CAS Center for Excellence in Brain Science and Intelligence
推荐引用方式 GB/T 7714	Wangli Hao,Zhaoxiang Zhang,He Guan. CMCGAN A Uniform Framework for CrossModal VisualAudio Mutual Generation[C]. 见:. Hilton New Orleans Riverside, American. 2018.2.1.