Latent Structure Mining With Contrastive Modality Fusion for Multimedia Recommendation
Zhang, Jinghao1,3; Zhu, Yanqiao2; Liu, Qiang1,3; Zhang, Mengqi1,3; Wu, Shu1,3; Wang, Liang1,3
刊名IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
2023-09-01
卷号35期号:9页码:9154-9167
关键词Multimedia recommendation graph structure learning contrastive learning
ISSN号1041-4347
DOI10.1109/TKDE.2022.3221949
通讯作者Wu, Shu(shu.wu@nlpr.ia.ac.cn)
英文摘要Multimedia contents are of predominance in the modern Web era. Recent years have witnessed growing research interests in multimedia recommendation, which aims to predict whether a user will interact with an item with multimodal contents. Most previous studies focus on modeling user-item interactions with multimodal features included as side information. However, this scheme is not well-designed for multimedia recommendation. First, only collaborative item-item relationships are implicitly modeled through high-order item-user-item co-occurrences. Considering that items are associated with rich contents in multiple modalities, we argue that the latent semantic item-item structures underlying these multimodal contents could be beneficial for learning better item representations and assist the recommender models to comprehensively discover candidate items. Second, although previous studies consider multiple modalities, their ways of fusing multiple modalities by linear combination or concatenation is insufficient to fully capture content information of items and item relationships. To address these deficiencies, we propose a latent structure MIning with ContRastive mOdality fusion model, which we term MICRO for brevity. To be specific, we devise a novel modality-aware structure learning module, which learns item-item relationships for each modality. Based on the learned modality-aware latent item relationships, we perform graph convolutions to explicitly inject item affinities into modality-aware item representations. Additionally, we design a novel multimodal contrastive framework to facilitate item-level multimodal fusion by mining both modality-shared and modality-specific information. Finally, the item representations are plugged into existing collaborative filtering methods to make accurate recommendation. Extensive experiments on three real-world datasets demonstrate the superiority of our method over state-of-arts and rationalize the design choice of our work.
资助项目National Natural Science Foundation of China[62141608] ; National Natural Science Foundation of China[62236010] ; National Natural Science Foundation of China[62206291]
WOS研究方向Computer Science ; Engineering
语种英语
出版者IEEE COMPUTER SOC
WOS记录号WOS:001045704800034
资助机构National Natural Science Foundation of China
内容类型期刊论文
源URL[http://ir.ia.ac.cn/handle/173211/53942]  
专题多模态人工智能系统全国重点实验室
通讯作者Wu, Shu
作者单位1.Chinese Acad Sci, Inst Automat, Ctr Res Intelligent Percept & Comp, Beijing 100045, Peoples R China
2.Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
3.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China
推荐引用方式
GB/T 7714
Zhang, Jinghao,Zhu, Yanqiao,Liu, Qiang,et al. Latent Structure Mining With Contrastive Modality Fusion for Multimedia Recommendation[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,2023,35(9):9154-9167.
APA Zhang, Jinghao,Zhu, Yanqiao,Liu, Qiang,Zhang, Mengqi,Wu, Shu,&Wang, Liang.(2023).Latent Structure Mining With Contrastive Modality Fusion for Multimedia Recommendation.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,35(9),9154-9167.
MLA Zhang, Jinghao,et al."Latent Structure Mining With Contrastive Modality Fusion for Multimedia Recommendation".IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING 35.9(2023):9154-9167.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace