Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection | |
Zhang, Zhaoxiang1,2,4,5; Pan, Cong1,4,5; Peng, Junran3 | |
刊名 | INTERNATIONAL JOURNAL OF COMPUTER VISION |
2022-04-01 | |
卷号 | 130期号:4页码:970-989 |
关键词 | Computer vision Object detection Effective receptive fields Hardware acceleration |
ISSN号 | 0920-5691 |
DOI | 10.1007/s11263-021-01573-6 |
通讯作者 | Peng, Junran(pengjunran@huawei.com) |
英文摘要 | Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a scale-transferrable architecture for practical object detection based on the analysis of the connection between dilation rate and effective receptive field. Our method firstly predicts a global continuous scale, which is shared by all positions, for each convolution filter of each network stage. Secondly, we average the spatial features and distill the scale from channels to effectively learn the scale. Thirdly, for fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into the combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. Moreover, to overcome the shortcomings of our method for large-scale object detection, we modify the Feature Pyramid Network structure. Finally, we illustrate the orthogonality role of our method for sampling strategy. We demonstrate the effectiveness of our method on one-stage and two-stage algorithms under different configurations and compare them with different dilated convolution blocks. For practical applications, the training strategy of our method is simple and efficient, avoiding complex data sampling or optimization strategy. During inference, we reduce the latency of the proposed method by using the hardware accelerator TensorRT without extra operation. On the COCO test-dev, our model achieves 41.7% mAP on one-stage detector and 42.5% mAP on two-stage detector based on ResNet-101, and outperforms baselines by 3.2% and 3.1% mAP, respectively. |
资助项目 | Major Project for New Generation of AI[2018AAA0100400] ; NationalNatural Science Foundation of China[61836014] ; NationalNatural Science Foundation of China[U21B 2042] |
WOS研究方向 | Computer Science |
语种 | 英语 |
出版者 | SPRINGER |
WOS记录号 | WOS:000759289300002 |
资助机构 | Major Project for New Generation of AI ; NationalNatural Science Foundation of China |
内容类型 | 期刊论文 |
源URL | [http://ir.ia.ac.cn/handle/173211/47954] |
专题 | 自动化研究所_智能感知与计算研究中心 |
通讯作者 | Peng, Junran |
作者单位 | 1.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China 2.Chinese Acad Sci, Hong Kong Inst Sci & Innovat, Ctr Artificial Intelligence & Robot, Hong Kong, Peoples R China 3.Huawei Cloud & AI, Beijing, Peoples R China 4.Univ Chinese Acad Sci, Sch Future Technol, Beijing, Peoples R China 5.Ctr Res Intelligent Percept & Comp, Beijing, Peoples R China |
推荐引用方式 GB/T 7714 | Zhang, Zhaoxiang,Pan, Cong,Peng, Junran. Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION,2022,130(4):970-989. |
APA | Zhang, Zhaoxiang,Pan, Cong,&Peng, Junran.(2022).Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection.INTERNATIONAL JOURNAL OF COMPUTER VISION,130(4),970-989. |
MLA | Zhang, Zhaoxiang,et al."Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection".INTERNATIONAL JOURNAL OF COMPUTER VISION 130.4(2022):970-989. |
个性服务 |
查看访问统计 |
相关权益政策 |
暂无数据 |
收藏/分享 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论