Joint Token and Feature Alignment Framework for Text-Based Person Search
Li, Shangze3; Lu, Andong3; Huang, Yan1; Li, Chenglong2; Wang, Liang1
刊名IEEE SIGNAL PROCESSING LETTERS
2022
卷号29页码:2238-2242
关键词Feature extraction Visualization Representation learning Logic gates Image reconstruction Transformers Training Cross-modal generation feature alignment text-based person search token alignment transformer
ISSN号1070-9908
DOI10.1109/LSP.2022.3217682
通讯作者Li, Chenglong(lcl1314@foxmail.com)
英文摘要Text-based person search is a challenging cross-modal retrieval task. Existing works reduce the inter-modality and intra-class gaps by aligning local features extracted from image and text modalities, which easily lead to mismatching problems due to the lack of annotation information. Besides, it is sub-optimal to reduce two gaps simultaneously in the same feature space. This work proposes a novel joint token and feature alignment framework to reduce the inter-modality and intra-class gaps progressively. Specifically, we first build a dual-path feature learning network to extract features and conduct feature alignment to reduce the inter-modality gap. Second, we design a text generation module to generate token sequences using visual features, and then token alignment is performed to reduce the intra-class gap. Last, a fusion interaction module is introduced to further eliminate the modality heterogeneity using the strategy of multi-stage feature fusion. Extensive experiments on the CUHK-PEDES dataset demonstrate the effectiveness of our model, which significantly outperforms previous state-of-the-art methods.
资助项目National Natural Science Foundation of China[61976003] ; National Natural Science Foundation of China[62076003] ; Anhui Provincial Key Research and Development Program[202104d07020008] ; Open Project Program of the National Laboratory of Pattern Recognition (NLPR) ; Gaofeng Discipline Construction Project (Computer Science and Technology)[Z010111016]
WOS研究方向Engineering
语种英语
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号WOS:000880641600004
资助机构National Natural Science Foundation of China ; Anhui Provincial Key Research and Development Program ; Open Project Program of the National Laboratory of Pattern Recognition (NLPR) ; Gaofeng Discipline Construction Project (Computer Science and Technology)
内容类型期刊论文
源URL[http://ir.ia.ac.cn/handle/173211/50678]  
专题自动化研究所_智能感知与计算研究中心
通讯作者Li, Chenglong
作者单位1.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
2.Anhui Univ, Sch Artificial Intelligence, Informat Mat & Intelligent Sensing Lab Anhui Prov, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China
3.Anhui Univ, Sch Comp Sci & Technol, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China
推荐引用方式
GB/T 7714
Li, Shangze,Lu, Andong,Huang, Yan,et al. Joint Token and Feature Alignment Framework for Text-Based Person Search[J]. IEEE SIGNAL PROCESSING LETTERS,2022,29:2238-2242.
APA Li, Shangze,Lu, Andong,Huang, Yan,Li, Chenglong,&Wang, Liang.(2022).Joint Token and Feature Alignment Framework for Text-Based Person Search.IEEE SIGNAL PROCESSING LETTERS,29,2238-2242.
MLA Li, Shangze,et al."Joint Token and Feature Alignment Framework for Text-Based Person Search".IEEE SIGNAL PROCESSING LETTERS 29(2022):2238-2242.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace