Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification
Ji, Ruyi1,5; Li, Jiaying4; Zhang, Libo1; Liu, Jing2,3; Wu, Yanjun1
刊名IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY
2023-09-01
卷号33期号:9页码:5009-5021
关键词Transformer multi-grained assembly fine-grained visual classification
ISSN号1051-8215
DOI10.1109/TCSVT.2023.3248791
通讯作者Zhang, Libo(libo@iscas.ac.cn)
英文摘要Fine-grained visual classification requires distinguishing sub-categories within the same super-category, which suffers from small inter-class and large intra-class variances. This paper aims to improve the FGVC task towards better performance, for which we deliver a novel dual Transformer framework (coined Dual-TR) with multi-grained assembly. The Dual-TR is well-designed to encode fine-grained objects by two parallel hierarchies, which is amenable to capturing the subtle yet discriminative cues via the self-attention mechanism in ViT. Specifically, we perform orthogonal multi-grained assembly within the Transformer structure for a more robust representation, i.e., intra-layer and inter-layer assembly. The former aims to explore the informative feature in various self-attention heads within the Transformer layer. The latter pays attention to the token assembly across Transformer layers. Meanwhile, we introduce the constraint of center loss to pull intra-class samples' compactness and push that of inter-class samples. Extensive experiments show that Dual-TR performs on par with the state-of-the-art methods on four public benchmarks, including CUB-200-2011, NABirds, iNaturalist2017, and Stanford Dogs. The comprehensive ablation studies further demonstrate the effectiveness of architectural design choices.
资助项目Key Research Program of Frontier Sciences, CAS[ZDBSLY-JSC038] ; CAAI-Huawei MindSpore Open Fund and Youth Innovation Promotion Association, CAS[2020111]
WOS研究方向Engineering
语种英语
出版者IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
WOS记录号WOS:001063316800042
资助机构Key Research Program of Frontier Sciences, CAS ; CAAI-Huawei MindSpore Open Fund and Youth Innovation Promotion Association, CAS
内容类型期刊论文
源URL[http://ir.ia.ac.cn/handle/173211/53131]  
专题紫东太初大模型研究中心
通讯作者Zhang, Libo
作者单位1.Chinese Acad Sci, State Key Lab Comp Sci, Inst Software, Beijing 100190, Peoples R China
2.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
3.Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101400, Peoples R China
4.Beijing Informat Sci & Technol Univ, Sch Comp Sci, Beijing 100192, Peoples R China
5.Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing 101400, Peoples R China
推荐引用方式
GB/T 7714
Ji, Ruyi,Li, Jiaying,Zhang, Libo,et al. Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,2023,33(9):5009-5021.
APA Ji, Ruyi,Li, Jiaying,Zhang, Libo,Liu, Jing,&Wu, Yanjun.(2023).Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification.IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,33(9),5009-5021.
MLA Ji, Ruyi,et al."Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification".IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 33.9(2023):5009-5021.
个性服务
查看访问统计
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
暂无评论
 

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。


©版权所有 ©2017 CSpace - Powered by CSpace