Crowd counting, which requires to estimate crowd
density from an image, is still a challenging task in computer
vision. Most of the current methods are focused on large
scale variation of people and ignore the huge distribution
difference of crowd. To tackle these two problems together,
we propose a novel framework named Spatial Normalization
Network (SNNet). We normalize multi-scale features from
parallel subnetworks to a particular scale and then fuse them to
acquire rich spatial information for final accurate density map
predictions. Furthermore, we propose a novel normalization
layer called Spatial Group Normalization (SGN), which firstly
split feature maps along the spatial dimension and then perform
group-wise normalization. It’s useful to solve statistic shift
problems caused by the great difference of distribution in crowd
counting. Moreover, SGN can be naturally plugged into existing
solutions and brings significant improvement in crowd counting.
Our proposed SNNet achieves state-of-the-art performance on
four challenging crowd counting datasets (ShanghaiTech, UCFQNRF,
GCC and TRANCOS datasets), which demonstrates
the effectiveness and robust feature learning capability of our
methods.
修改评论