Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection

doi:10.1007/s11263-021-01573-6

CORC > 自动化研究所 > 中国科学院自动化研究所 > 智能感知与计算研究中心

	Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection
	Zhang, Zhaoxiang 1,2,4,5; Pan, Cong 1,4,5; Peng, Junran 3
刊名	INTERNATIONAL JOURNAL OF COMPUTER VISION
	2022-04-01
卷号	130 期号:4 页码:970-989
关键词	Computer vision Object detection Effective receptive fields Hardware acceleration
ISSN号	0920-5691
DOI	10.1007/s11263-021-01573-6
通讯作者	Peng, Junran(pengjunran@huawei.com)
英文摘要	Scale-sensitive object detection remains a challenging task, where most of the existing methods could not learn it explicitly and are not robust. Besides, they are less efficient during training or slow during inference, which is not friendly to real-time applications. In this paper, we propose a scale-transferrable architecture for practical object detection based on the analysis of the connection between dilation rate and effective receptive field. Our method firstly predicts a global continuous scale, which is shared by all positions, for each convolution filter of each network stage. Secondly, we average the spatial features and distill the scale from channels to effectively learn the scale. Thirdly, for fast-deployment, we propose a scale decomposition method that transfers the robust fractional scale into the combination of fixed integral scales for each convolution filter, which exploits the dilated convolution. Moreover, to overcome the shortcomings of our method for large-scale object detection, we modify the Feature Pyramid Network structure. Finally, we illustrate the orthogonality role of our method for sampling strategy. We demonstrate the effectiveness of our method on one-stage and two-stage algorithms under different configurations and compare them with different dilated convolution blocks. For practical applications, the training strategy of our method is simple and efficient, avoiding complex data sampling or optimization strategy. During inference, we reduce the latency of the proposed method by using the hardware accelerator TensorRT without extra operation. On the COCO test-dev, our model achieves 41.7% mAP on one-stage detector and 42.5% mAP on two-stage detector based on ResNet-101, and outperforms baselines by 3.2% and 3.1% mAP, respectively.
资助项目	Major Project for New Generation of AI[2018AAA0100400] ; NationalNatural Science Foundation of China[61836014] ; NationalNatural Science Foundation of China[U21B 2042]
WOS研究方向	Computer Science
语种	英语
出版者	SPRINGER
WOS记录号	WOS:000759289300002
资助机构	Major Project for New Generation of AI ; NationalNatural Science Foundation of China
内容类型	期刊论文
源URL	[http://ir.ia.ac.cn/handle/173211/47954]
专题	自动化研究所_智能感知与计算研究中心
通讯作者	Peng, Junran
作者单位	1.Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing, Peoples R China 2.Chinese Acad Sci, Hong Kong Inst Sci & Innovat, Ctr Artificial Intelligence & Robot, Hong Kong, Peoples R China 3.Huawei Cloud & AI, Beijing, Peoples R China 4.Univ Chinese Acad Sci, Sch Future Technol, Beijing, Peoples R China 5.Ctr Res Intelligent Percept & Comp, Beijing, Peoples R China
推荐引用方式 GB/T 7714	Zhang, Zhaoxiang,Pan, Cong,Peng, Junran. Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION,2022,130(4):970-989.
APA	Zhang, Zhaoxiang,Pan, Cong,&Peng, Junran.(2022).Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection.INTERNATIONAL JOURNAL OF COMPUTER VISION,130(4),970-989.
MLA	Zhang, Zhaoxiang,et al."Delving into the Effectiveness of Receptive Fields: Learning Scale-Transferrable Architectures for Practical Object Detection".INTERNATIONAL JOURNAL OF COMPUTER VISION 130.4(2022):970-989.