CrowdNet: A Deep Convolutional Network for Dense Crowd Counting

Hassam Hussain Khan

visibility

…

description

5 pages

link

1 file

Our work proposes a novel deep learning framework for estimating crowd density from static images of highly dense crowds. We use a combination of deep and shallow, fully convolutional networks to predict the density map for a given crowd image. Such a combination is used for effectively capturing both the high-level semantic information (face/body detectors) and the low-level features (blob detectors), that are necessary for crowd counting under large scale variations. As most crowd datasets have limited training samples (<100 images) and deep learning based approaches require large amounts of training data, we perform multiscale data augmentation. Augmenting the training samples in such a manner helps in guiding the CNN to learn scale invariant representations. Our method is tested on the challenging UCF CC 50 dataset, and shown to outperform the state of the art methods.

Figures (5)

Figure 1: Crowd images with head annotations marked using red dots and their corresponding esti- mated crowd density maps R. Venkatesh Babu Video Analytics Lab Indian Institute of Science Bangalore, INDIA - 560012 venky@cds.iisc.ac.in Srinivas S S Kruthiventi Video Analytics Lab Indian Institute of Science Bangalore, INDIA - 560012 kssaisrinivas@qmail.com — Figure 1: Crowd images with head annotations marked using red dots and their corresponding esti- mated crowd density maps R. Venkatesh Babu Video Analytics Lab Indian Institute of Science Bangalore, INDIA - 560012 [email protected] Srinivas S S Kruthiventi Video Analytics Lab Indian Institute of Science Bangalore, INDIA - 560012 [email protected]

Figure 3: Our network is designed to be robust to scale variations by training it with patches cropped from multi-scale image pyramid.

We use Mean Absolute Error (MAE) to quantify the per- formance of our method. MAE computes the mean of abso- lute difference between the actual count and the predicted count for all the images in the dataset. The results of the proposed approach along with other recent methods are shown in Table. 4.1. The results shown do not include any post-processing methods. The results illustrate that our ap- proach achieves state-of-the-art performance in crowd count- ing.

We also show the predicted count for each image in the dataset along with its actual count in Fig. 4. For most of the images, the predicted count lies close to the actual count. However, we observe that the proposed approach tends to underestimate the count in cases of images with more than 2500 people. This estimation error could possibly be a con- sequence of the insufficient number of training images with such large crowds in the dataset. Figure 4: Actual count vs. Predicted Count for each of the 50 images in the UCF_CC_50 dataset.