0% found this document useful (0 votes)

14 views8 pages

Crowd Counting Using Multiple Local Features

Uploaded by

Mokhirjon Rikhsivoev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views8 pages

Crowd Counting Using Multiple Local Features

Uploaded by

Mokhirjon Rikhsivoev

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

2009 Digital Image Computing: Techniques and Applications

Crowd Counting using Multiple Local Features

David Ryan, Simon Denman, Clinton Fookes, Sridha Sridharan

Image and Video Laboratory
Queensland University of Technology
Brisbane, Australia
[email protected], [email protected], [email protected], [email protected]

Abstract—In public venues, crowd size is a key indicator to an individual or small group within an image. While
of crowd safety and stability. Crowding levels can be detected existing techniques have used similar local features such
using holistic image features, however this requires a large as foreground pixels, they are analysed at a holistic level.
amount of training data to capture the wide variations in crowd
distribution. If a crowd counting algorithm is to be deployed Local features are used here to estimate the number of
across a large number of cameras, such a large and burdensome people within each group, so that the total crowd estimate
training requirement is far from ideal. In this paper we propose is the sum of all group sizes. As local features are used,
an approach that uses local features to count the number of training data must also be annotated with local information.
people in each foreground blob segment, so that the total crowd To provide appropriate training data, a unique method of
estimate is the sum of the group sizes. This results in an
approach that is scalable to crowd volumes not seen in the localised ground truth annotation is proposed which greatly
training data, and can be trained on a very small data set. As reduces the required training data.
a local approach is used, the proposed algorithm can easily be As well as the reduced training requirement, a localised
used to estimate crowd density throughout different regions of approach also enables the estimation of crowd densities at
the scene and be used in a multi-camera environment. A unique
different locations within the scene (unlike holistic systems
localised approach to ground truth annotation reduces the
required training data is also presented, as a localised approach which can only provide a density for the whole scene),
to crowd counting has different training requirements to a and allows for a simplistic extension to a multi-camera
holistic one. Testing on a large pedestrian database compares environment. The ability to determine local crowd densities
the proposed technique to existing holistic techniques and greatly improves the systems ability to detect abnormalities
demonstrates improved accuracy, and superior performance
in a scene. While the overall number of people in a scene
when test conditions are unseen in the training set, or a minimal
training set is used. may be considered normal, there may be a very high
concentration of people in a small area. Holistic systems are
Keywords-Crowd Counting, Crowd Density, Local Features,
unable to detect such an abnormality, however the proposed
Foreground segmentation
local approach can easily detect such an occurrence.
I. I NTRODUCTION The proposed system is tested on a 2000 frame database
[4] featuring crowds of size 11-45 people. The proposed
In large public places, it is often impossible to monitor technique is compared to two holistic techniques, and is
every person for suspicious behaviour. The threats posed shown to outperform holistic techniques in terms of accu-
in crowded environments are of a different nature to those racy, scalability and practicality. The system is shown to be
posed by an individual, and arise from the crowd’s collective highly scalable, as it is capable of extrapolating to count
properties: “a crowd is something other than the sum of its crowds which are larger or smaller than those encountered
parts” [6]. These threats include fighting, rioting, violent during training; and highly practical, as it is able to count
protest, mass panic and excitement. The most common crowds when trained on as few as 10 frames of training data.
indicator of such behaviour is crowd size, which may also The remainder of the paper is structured as follows:
be an indicator of congestion, delay or other abnormality. As Section II provides an overview of existing crowd counting
crowd size is a holistic description of the scene, the majority techniques, Section III outlines the proposed algorithm,
of crowd counting techniques have utilised holistic features Section IV describes the proposed ground truth annotation
to estimate crowd size. However, due to the wide variability method, Section V presents experimental results and Section
in crowd behaviours, distribution, density and overall size, VI presents conclusions and possible directions for future
holistic systems require a very large training set. In a facility work.
containing numerous cameras, it is not practical to supply
hundreds of frames of ground truth for potentially hundreds II. E XISTING W ORK
of cameras.
In this paper we propose a novel approach that uses The task of crowd counting has been approached from
local features, defined here as features which are specific a number of angles, but the techniques share a common

978-0-7695-3866-2/09 $26.00 © 2009 IEEE 74

81
DOI 10.1109/DICTA.2009.22
Authorized licensed use limited to: University of P.J. Safarik Kosice. Downloaded on June 16,2024 at 21:42:48 UTC from IEEE Xplore. Restrictions apply.
framework: feature extraction using image processing, fol- neither of these algorithms is concerned with the overall size
lowed by crowd counting using classification. The output of the crowd.
of the classifier is a measure of crowding, which is a
III. C ROWD C OUNTING U SING M ULTIPLE L OCAL
holistic description of a scene. Therefore it is logical to use
F EATURES
holistic features which are indicative of larger crowds. Local
features, however, provide more detailed information about A. System Description
a scene. As computer power increases, these techniques have A crowd counting system is proposed which uses local
become more popular. rather than holistic features. These features are ‘local’ with
Holistic features, such as textural information [12], respect to the blob segments in a foreground mask, obtained
Minkowski Fractal Dimension [11], and Translation Invari- using a foreground segmentation technique [7]. A crowd
ant Orthonormal Chebyshev Moments [15] have been used estimate is obtained for each blob in an image, so that the
to measure crowd density. Holistic features such as these total estimate for the scene is the sum of the estimates for
are highly sensitive to external changes (such as lighting each individual blob. In order to train the system, ground
conditions), and it has been shown that for outdoor environ- truth annotation is performed after the first stage of image
ments, the natural fluctuations in lighting between morning processing, once the foreground is extracted. The group size
and afternoon reduce system performance [15]. is manually counted for each blob in an image, therefore
More recent crowd counting algorithms have utilised each frame provides several instances of ground truth.
specific features which are indicative of crowding, such as This approach is built on the assumption that it is easier
edge and foreground pixels. While these features are local for a system to estimate the number of people in each group
to points of interest in an image, they are considered at than to estimate the entire crowd at once. It is possible
a holistic level. Many techniques [6], [14], [9] have used for a crowd of 20 people to be distributed as two large
foreground segmentation to determine the crowd count. The groups or as ten pairs (for example). Viewed from a holistic
relationship between the total number of foreground pixels perspective, these various crowd distributions can give rise to
and the number of people in the scene has been shown to be vastly different image features. Existing techniques cope by
approximately linear [6]. However, local nonlinearities arise extracting a larger quantity of holistic features (29 features
due to the effects of perspective and occlusion. are used in [4]), necessitating more training data and/or
Paragios [14] proposed the use a geometric factor to intensive classification strategies. We hypothesise that the
weight each pixel according to its location on the ground relationship between image features and group size is more
plane, to overcome the problem of perspective. Occlusions reliable and consistent on a local scale.
have been addressed using blob size histograms [9], or by
B. Perspective Normalisation
using more features [4]. The blob size histogram captures
the range of blob sizes present in an image (compared To account for perspective, a density map is calculated
to a foreground pixel count), and enables the classifier to using the relative sizes of two reference persons. This is
distinguish between groups of people and individuals. By calculated in the same manner as [4]. The weight applied
contrast, Chan et al. [4] extract features in a greater quantity, at pixel (i, j) to a two-dimensional feature is W (i, j). For
however additional features greatly increase the quantity of one-dimensional features, such as edges, the square root of
training data required. the weight is applied.
Local features are specific to an individual or small group C. Feature Extraction
of people within an image. For example, head detection has
been proposed to estimate crowd sizes [10]. Tracking [13] Several features are extracted from each blob segment in
and blob segmentation [16] have been employed, however order to estimate the number of people in the group. The
these approaches are best suited to situations where crowds features extracted are similar to those used in [9] and [4],
are small. Celik [3] assumed linearity between blob size taken locally. These features are:
and group size, and Kilambi [8] used an elliptical cylinder • Area: The total pixel count for the blob segment, each

model and tracking to estimate group size. While these pixel weighted by its value in the density map.

systems all employ local features, they often rest on specific Bsize = W (i, j)
assumptions, including image quality. When presented with
low-quality video and poor segmentation, it is difficult to where (i, j) ∈ B, and Bsize is the calculated area of
classify or track the local features unless ground truth is blob B.
also annotated on a local level. • Perimeter: The total pixel count for the blob’s perime-
Local features have been employed to other crowd related ter, each weighted by the square root of its value in the
problems though, such as crowd detection [2] (detection of density map.
human like objects and repeating structures) and analysis of • Perimeter-Area Ratio: The ratio of perimeter to area,
crowd stability [1] (using optical flow over time). However a measure of shape complexity [4].

82
75

Authorized licensed use limited to: University of P.J. Safarik Kosice. Downloaded on June 16,2024 at 21:42:48 UTC from IEEE Xplore. Restrictions apply.
• Edges: The total pixel count of edges within the blob,
extracted from the image using Canny edge detection.
Each pixel was weighted by the square root of its value
in the density map.
• Edge Angle Histogram: The histogram of edge angles,
obtained from the edge detection. Six histogram bins
are used in the range 0◦ - 180◦ [9]. Each pixel’s
contribution to a histogram bin is the square root of
its value in the density map.
D. Crowd Counting
The features extracted from each blob serve as inputs (a) Frame 1280.
to a classifier. The output of the classifier is gi , the group
size estimate for the ith blob. A neural network was used
to perform classification, as this has proven successful in
previous research [12], [9]. In order to test whether local
features can be classified using simpler strategies, a basic
linear model was also tested:
NF
(b) Foreground mask. (c) Region of interest.
gi = w0 + wn fn (1)
n=1 Figure 1. A frame from the testing database.
where wn is the weight assigned to feature fn , given
NF features. The weights are calculated using least squares
regression. The total crowd estimate for a frame containing
NB blobs is then calculated:
NB

C= gi (2)
i=1 (a) Correct extraction of individuals, (b) Person (top, centre) is frag-
with additional noise (i.e. small bar mented into two blobs, one of which
The estimate will vary from frame to frame as pedestrians near centre). is merged with nearby blob(s).
enter and exit a scene simultaneously. A rapidly fluctuating
estimate is not usable or accurate. A median filter provides
smoothness and stability to the estimate, as well as making
it robust against outlier estimates.
A median filter of length 2n + 1 will select the median
(c) Person (left) is fragmented into (d) Person (top, centre) blends into
estimate from n frames either side of the frame in question. two blobs. background leaving few foreground
This is a non-causal filter which, implemented in practice, pixels. This person is barely visible
will introduce a delay of n frames. For this application we to the human eye.
use a median filter of length 41 (n = 20). At a frame rate
of 10 fps, the delay is 2.0 seconds. Figure 2. Typical errors in foreground extraction.

IV. A LGORITHM T RAINING AND G ROUND T RUTH

A NNOTATION Ideally, a single blob will correspond to a whole number
The proposed algorithm is trained and tested on the of people as shown in Figure 2(a). However, foreground
data used in [4]. This database contains 2000 frames of segmentation on a low resolution grayscale image is prone
pedestrian traffic moving in two directions. The video has to errors, examples of which are shown in Figure 2. There
been downsampled to 238×158 pixels and 10 fps, grayscale. are three types of segmentation errors that can occur:
An example frame is shown in Figure 1. 1) A single person is split into multiple foreground blobs
As the proposed algorithm calculates crowd size by deter- (Figure 2(c)). In this case, the contribution of the per-
mining the number of people in each blob, the ground truth son is split across multiple blobs, in direct proportion
annotation must specify a person count for each blob. As to the number of pixels contained in each blob (i.e.
such, ground truth annotation is performed after foreground for a person fragmented into three blobs representing
segmentation. A GUI was written which enables the operator the upper body and each leg, the blobs may receive
to do this. weights of 0.6, 0.2 and 0.2 for the upper body and each

83
76

Authorized licensed use limited to: University of P.J. Safarik Kosice. Downloaded on June 16,2024 at 21:42:48 UTC from IEEE Xplore. Restrictions apply.
leg respectively). The assignment of these weights is Ground Truth for Frames 1270 to 1310.

Number of people
made by the computer according to the blob sizes. 36
2) Part of a person is split in isolation from the group they
are with (Figure 2(b)). In this case, the contribution 34
of the person is split across multiple (n) blobs equally
(1/n to each). Proportional contributions would not 32
be suitable, because some fragments are merged with 1270 1280 1290 1300 1310
neighbouring blobs. Frame
3) The motion detection fails to detect a person (Figure
2(d)). In this case, no assignment is made because the Figure 3. Ground truth for frames 1270 to 1310.
person has blended completely into the background so
that very few, if any, foreground pixels are present. If
Figure 3 shows the number of people inside the region of
this is a common occurrence, then the problem must
interest over 40 frames (4.0 seconds). Based on the number
be addressed at the segmentation stage (if possible).
of increments and decrements in this graph, there are at
(In the database used there are only a small number of
least 13 instances of pedestrians either entering or exiting the
instances where this occurs, and these only occur in
scene in this time. An example frame from this sequence is
one part of the scene where the background is dark).
shown in Figure 1(a). The pedestrian at the bottom left in this
Assuming it is a rare occurrence, no contribution is
sequence takes more than 30 frames to fully enter the scene.
assigned to the faded person. The reason for this is
With groups entering and exiting the scene at this rate, yet
that assigning a large weight to a tiny blob may lead to
taking several frames to do so, it would be difficult even for
misclassification at other locations in the scene, where
a human to estimate the exact crowd size, and impossible
tiny blobs are merely products of noise, such as in
for them to remain consistent in their definition of what
Figure 2(a).
constitutes being ‘in’ or ‘out’ of the scene. In a scene such
The correspondences between pedestrians and foreground as this, where crowd size varies between 11 and 45 people,
blobs are entered via the GUI. The above scenarios and it is suggested that an estimate within 3 of the ground truth
the methods for handling them are used throughout the is acceptable. For testing purposes we consider the following
ground truth process to ensure that labelling is performed measures of accuracy:
in a consistent manner.
• Error: The mean value of the absolute difference

V. E VALUATION AND R ESULTS between the crowd estimate and the ground truth.
• MSE: The mean value of the error squared.
A. Testing Criteria • Acceptability: The percentage of frames for which the
The performance of the proposed system is assessed using absolute error may be deemed ‘acceptable’, that is, less
three criteria: than or equal to 3.
1) Accuracy, 2) Scalability: Ideally, the training data must cover a
2) Scalability, wide range of scenarios, similar to those which are expected
3) Practicality. to be found during operation. In the case of crowd counting,
Accuracy is measured by comparing the detected number however, we may not have access to video footage of
of pedestrians with the number annotated in the ground truth. all possible scenarios. Excessive levels of over or under
Scalability is evaluated by using training and testing sets crowding may not be present in the training data because
such that the types of crowds seen in testing are not present these events are abnormal, and this is the reason we wish
in the training set. Practicality is evaluated through the use to detect them. A system which cannot extrapolate in this
of reduced training sets. context is of little practical use. We test the scalability of
1) Accuracy: Although this system is trained on the basis this system using two methods:
of individual blobs, the testing still takes place on a holistic • Downscaling: The system is trained on large crowds,
level. The accuracy of a system can be any measure of how and tested on smaller crowds.
closely the estimate follows the ground truth. The ground • Upscaling: The system is trained on small crowds, and
truth for the holistic crowd count was taken as the number tested on larger crowds.
of (x, y) person coordinates which lay within the region of 3) Practicality: For a crowd counting system to be
interest. However, the exact point in time at which a person practical, it must be relatively easy to deploy. For real
is deemed to have entered or exited a frame is never clearly world deployment where the algorithm may be required
defined. It may take several seconds between a pedestrian run on several hundred different cameras within a single
reaching the border of the region of interest, and being fully installation, being able to use a reduced training set is highly
inside or outside of it. desirable. When training crowd counting algorithms, each

84
77

Authorized licensed use limited to: University of P.J. Safarik Kosice. Downloaded on June 16,2024 at 21:42:48 UTC from IEEE Xplore. Restrictions apply.
training frame requires ground truth to be supplied. If several neural network classifier. The poorer performance of the
hundred training frames are needed for each camera ([6] uses neural network classifier can be attributed to the training data
150 frames, each taken 10 seconds apart for training; [4] used. It is expected that for a larger training set, performance
uses 800 consecutive frames for training), then the process would equal or exceed that of the linear classifier.
of training becomes very tedious and time consuming. To 2) Scalability: Scalability is tested in two steps, down-
assess practicality, systems are evaluated using reduced scaling and upscaling. To test downscaling, frames 1205,
training sets. 1210, ..., 1600 are designated for training (80 total), featur-
ing crowds of size 30-45. These frames contain a mixture
B. Systems Tested
of large and small blobs. Testing is performed on frames
Three crowd counting techniques are evaluated: 1-1200 and 1601-2000 (crowd sizes 11-40).
• Proposed: The system described here, in which local Due to the neural network’s poor extrapolation capa-
features are extracted for each blob and ground truth bilities, the holistic methods were unable to provide any
annotation is performed on a local level. meaningful results, as shown in Figure 5. The proposed
• Equivalent Holistic System: This is a system which system, trained on blobs of various sizes, was able to count
utilises the same features as the proposed system, taken smaller crowds.
on a holistic rather than local level. Ground truth is also The linear model is capable of superior extrapolation.
annotated on a holistic level. The results in Table II indicate that all three systems can
• Kong: Blobs are sorted into six histograms of bin width extrapolate downwards when linear fitting is used, however
1500, as described in [9]. An edge angle histogram is the proposed system is most accurate.
also calculated, for which we use six histogram bins To evaluate upscaling, frames 805, 810, ..., 1100 were
between 0◦ and 180◦ . This is also a holistic system. designated for training (60 total), featuring crowds of size
For each system, two classifiers are tested: a neural network 11-271. Testing was performed on frames 1-800 and 1101-
and linear model. 2000 (crowds 11-45). The blobs in the test set were larger
The results provided by [4] for this database can not be than those in the training set, therefore all systems were
compared, as their estimate was calculated for pedestrians unable to extrapolate when neural network classification
walking in either direction, rather than a total count. If was employed. As a result, evaluation results for the neural
the segmentation algorithm were changed from dynamic network classifier are not presented.
textures [5] to background subtraction, then the total count The linear model, however, is capable of extrapolation.
could be calculated. This would somewhat resemble the Table III and Figure 6 illustrate the ability of the system to
Equivalent Holistic System above, differentiated by the count crowds that are larger than those seen in the training
number of features. set. It can be seen that the proposed algorithm is better
equipped to deal with conditions that are unseen in the
C. Experimental Results training set.
1) Accuracy: The accuracy of each system listed in The superior performance on unseen conditions can be
Section V-B is tested. Frames 605, 610, ..., 1400 were desig- attributed to the manner in which the proposed algorithm
nated for training (160 total) and testing was performed on counts crowds. As each blob is considered individually, the
frames 1-600 and 1401-2000. Those in the training set were proposed algorithm only needs to have seen similar blobs
annotated with ground truth counts for each blob, which in the training data. The holistic approaches however need
was used to train the classifier. Neural network results differ to have seen a similar number of people overall in both and
slightly from test to test, therefore in order to determine a training and testing.
typical result for each system, the networks were retrained 3) Practicality: The fewer training frames required of a
five consecutive times. The test which returned the median system, the greater its practicality. While a neural network
MSE for the filtered output was taken. requires a large range of training data, the linear model
Results are tabulated in Table I. Results across the whole can be calculated with very little. Given this, only a linear
testing data set using the linear classifier are plotted in Figure classifier is used in evaluating the systems practicality. The
4. robustness of the proposed system is evaluated by testing the
By all three measures of accuracy, the proposed system systems using only 10 training frames (640, 720, ..., 1360).
significantly outperforms Kong and the equivalent holistic For Kong [9], in order to supply all of the histogram bins
system. The mean error of the filtered estimate is 1.353 and with sufficient data, it was necessary to train the algorithm
the estimate is acceptable (within 3 of ground truth) 95.67% on 40 frames (620, 640, ..., 1400). Testing was performed
of the time (for the linear classifier). The linear classifier on frames 1-600 and 1401-2000.
performs slightly better than the neural network, though 1 The training range was widened for Kong {805, 810, ..., 1300}, so that
similar performance trends are observed with the proposed the training data contained blobs large enough to contribute to each of the
system outperforming the other evaluated systems for a blob size histogram bins.

85
78

Authorized licensed use limited to: University of P.J. Safarik Kosice. Downloaded on June 16,2024 at 21:42:48 UTC from IEEE Xplore. Restrictions apply.
Accuracy Testing Results
45
Ground Truth
Estimate (Proposed)
Estimate (Holistic)
40 Estimate (Kong)

35
Number of people

10
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Frame

Figure 4. Accuracy testing results. Estimate is rounded and median filtered, and is shown for the test set only.

System Classifier Raw Estimate Median Filtered

Error MSE Accept. Error MSE Accept.
Proposed NN 1.889 5.646 86.75% 1.558 3.850 95.08%
Kong NN 2.976 15.158 68.08% 2.043 6.492 85.00%
Holistic NN 2.570 9.962 74.08% 2.296 7.116 82.67%
Proposed Linear 1.525 3.666 88.00% 1.353 3.065 95.67%
Kong Linear 2.072 6.079 78.00% 2.013 5.447 88.83%
Holistic Linear 1.798 4.720 84.42% 1.662 4.028 94.00%
Table I
A CCURACY TESTING RESULTS .

Proposed System using NN Kong’s System using NN Holistic System using NN

40 40 40
Number of people

30 30 30

20 20 20

10 10 10
Ground Truth Ground Truth Ground Truth
Estimate Estimate Estimate
0 0 0
0 1000 2000 0 1000 2000 0 1000 2000
Frame Frame Frame

Figure 5. Downscaling testing results using neural network. Estimate has been rounded but not filtered.

86
79

Authorized licensed use limited to: University of P.J. Safarik Kosice. Downloaded on June 16,2024 at 21:42:48 UTC from IEEE Xplore. Restrictions apply.
System Classifier Raw Estimate Median Filtered
Error MSE Accept. Error MSE Accept.
Proposed NN 2.086 6.701 82.56% 1.881 5.532 86.63%
Kong NN System failed. See Figure 5.
Holistic NN System failed. See Figure 5.
Proposed Linear 1.635 4.186 86.75% 1.537 3.674 92.81%
Kong Linear 2.659 10.074 59.31% 2.559 8.839 72.31%
Holistic Linear 2.341 8.787 71.69% 2.194 7.938 80.44%
Table II
D OWNSCALING TESTING RESULTS USING LINEAR FITTING .

System Training Raw Estimate Median Filtered

Set Size Error MSE Accept. Error MSE Accept.
Proposed 60 1.838 4.976 81.41% 1.654 4.075 93.65%
Kong 1001 2.779 10.068 60.00% 2.749 9.34 73.53%
Holistic 60 2.524 8.581 63.47% 2.448 7.842 78.88%
Table III
U PSCALING TESTING RESULTS USING LINEAR FITTING .

Downscaling Testing Results Upscaling Testing Results

45 45
Ground Truth Ground Truth
40 Estimate (Proposed) 40 Estimate (Proposed)
Estimate (Holistic) Estimate (Holistic)
35 Estimate (Kong) 35 Estimate (Kong)
Number of people

Number of people

30 30

25 25

20 20

15 15

10 10
0 500 1000 1500 2000 0 500 1000 1500 2000
Frame Frame

Figure 6. Downscaling and upscaling testing results (Linear Classifier Only).

Results are shown in Table IV. The proposed system training (10 frames), demonstrating practicality. The ability
outperforms the holistic systems using a limited training set, to train the system from as few as 10 frames means it can be
and achieves better results than when using a larger training easily deployed in a real world setting consisting of a large
set. The superior generalisation is likely due to the wider number (possibly hundreds) of cameras with much greater
spacing of the training frames. These results indicate that ease than holistic approaches.
the proposed system is highly practical, with accurate results The use of local features also makes estimating local
obtained from as few as 10 frames of training data. crowd density across the scene, and performing crowd
counting across a network of multiple overlapping cameras
VI. C ONCLUSIONS AND F UTURE W ORK possible. Analysing crowd densities at specific locations in
In this paper we have proposed the use of multiple a scene will enable the detection of local abnormalities. For
local features for crowd counting. This approach reduces example, a high-density crowd concentrated at one location
the task of crowd counting to the group level, so that the may require attention, even if the holistic count for the
crowd estimate is the sum of its parts. By three standards scene is at a safe level. The use of multiple cameras will
(accuracy, scalability and practicality), the proposed system enable larger environments to be covered and monitored,
outperforms existing holistic methods of crowd counting. as well as increasing accuracy in areas of overlap (due
The proposed system is capable of extrapolating outside of to the observations from multiple view points). Both these
the training range, and can also count crowds with minimal extensions will be investigated in the future. In addition,

87
80

Authorized licensed use limited to: University of P.J. Safarik Kosice. Downloaded on June 16,2024 at 21:42:48 UTC from IEEE Xplore. Restrictions apply.
System Training Set Raw Estimate Median Filtered
Error MSE Accept. Error MSE Accept.
Proposed 640,720,...,1360 1.306 2.684 93.17% 1.047 1.902 99.25%
Kong 620,640,...,1400 1.710 4.642 84.25% 1.352 3.200 93.75%
Holistic 640,720,...,1360 4.462 31.24 41.58% 3.538 17.788 57.83%
Table IV
P RACTICALITY TESTING RESULTS .

future work will also focus on capturing additional data for Vision and Pattern Recognition (CVPR 2001), pages 1034–
further testing, and evaluating the proposed algorithm in 1040, Dec. 2001.
[15] H. Rahmalan, M. Nixon, and J. Carter. On crowd density
conditions where there is poor segmentation performance,
estimation for surveillance. Crime and Security, 2006. The
reduced image resolution, and erroneous ground truth la- Institution of Engineering and Technology Conference on,
belling. pages 540–545, June 2006.
[16] T. Zhao and R. Nevatia. Bayesian human segmentation in
crowded situations. Computer Vision and Pattern Recog-
R EFERENCES nition, 2003. Proceedings. 2003 IEEE Computer Society
Conference on, 2:II–459–66 vol.2, June 2003.
[1] S. Ali and M. Shah. A lagrangian particle dynamics approach
for crowd flow segmentation and stability analysis. In
Computer Vision and Pattern Recognition, 2007. CVPR ’07.
IEEE Conference on, pages 1–6, 2007.
[2] O. Arandjelović. Crowd detection from still images. In Proc.
British Machine Vision Conference, 1:523–532, 2008.
[3] H. Celik, A. Hanjalic, and E. Hendriks. Towards a robust
solution to people counting. Image Processing, 2006 IEEE
International Conference on, pages 2401–2404, Oct. 2006.
[4] A. Chan, Z.-S. Liang, and N. Vasconcelos. Privacy preserving
crowd monitoring: Counting people without people models or
tracking. CVPR 2008, pages 1–7, June 2008.
[5] A. Chan and N. Vasconcelos. Modeling, clustering, and
segmenting video with mixtures of dynamic textures. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
30(5):909–926, May 2008.
[6] A. Davies, J. H. Yin, and S. Velastin. Crowd monitoring using
image processing. Electronics & Communication Engineering
Journal, 7(1):37–47, Feb 1995.
[7] S. Denman, V. Chandran, and S. Sridharan. An adaptive
optical flow technique for person tracking systems. Elsivier
Pattern Recognition Letters, 28(10):1232–1239, 2007.
[8] P. Kilambi, E. Ribnick, A. J. Joshi, O. Masoud, and N. Pa-
panikolopoulos. Estimating pedestrian counts in groups.
Comput. Vis. Image Underst., 110(1):43–59, 2008.
[9] D. Kong, D. Gray, and H. Tao. A viewpoint invariant
approach for crowd counting. Pattern Recognition, 2006.
ICPR 2006. 18th International Conference on, 3:1187–1190,
2006.
[10] S.-F. Lin, J.-Y. Chen, and H.-X. Chao. Estimation of number
of people in crowded scenes using perspective transformation.
Systems, Man and Cybernetics, Part A, IEEE Transactions on,
31(6):645–654, Nov 2001.
[11] A. Marana, L. Da Fontoura Costa, R. Lotufo, and S. Velastin.
Estimating crowd density with minkowski fractal dimension.
ICASSP ’99, 6:3521–3524 vol.6, Mar 1999.
[12] A. Marana, S. Velastin, L. Costa, and R. Lotufo. Estimation
of crowd density using image processing. Image Processing
for Security Applications (Digest No.: 1997/074), IEE Collo-
quium on, pages 11/1–11/8, Mar 1997.
[13] O. Masoud and N. Papanikolopoulos. A novel method
for tracking and counting pedestrians in real-time using a
single camera. Vehicular Technology, IEEE Transactions on,
50(5):1267–1278, Sep 2001.
[14] N. Paragios and V. Ramesh. A mrf-based approach for real-
time subway monitoring. In 2001 Conference on Computer

88
81

Authorized licensed use limited to: University of P.J. Safarik Kosice. Downloaded on June 16,2024 at 21:42:48 UTC from IEEE Xplore. Restrictions apply.

AI-Powered Crowd Counting
No ratings yet
AI-Powered Crowd Counting
6 pages
IET Image Processing - 2018 - Ma - Scene Invariant Crowd Counting Using Multi Scales Head Detection in Video Surveillance
No ratings yet
IET Image Processing - 2018 - Ma - Scene Invariant Crowd Counting Using Multi Scales Head Detection in Video Surveillance
7 pages
Counting People With Low-Level Features and Bayesian Regression
No ratings yet
Counting People With Low-Level Features and Bayesian Regression
14 pages
JETIR2209375
No ratings yet
JETIR2209375
6 pages
Thesis
No ratings yet
Thesis
182 pages
IET Image Processing - 2018 - Gao - Crowd Counting Considering Network Flow Constraints in Videos
No ratings yet
IET Image Processing - 2018 - Gao - Crowd Counting Considering Network Flow Constraints in Videos
9 pages
Residual Regression for Crowd Counting
No ratings yet
Residual Regression for Crowd Counting
10 pages
Sayali
No ratings yet
Sayali
7 pages
Deep Crowd Counting in Congested Scenes Through Refine Modules
No ratings yet
Deep Crowd Counting in Congested Scenes Through Refine Modules
10 pages
MTA2013 - Scene-Adaptive Accurate and Fast Vertical Crowd Counting Via Joint Using Depth and Color Information
No ratings yet
MTA2013 - Scene-Adaptive Accurate and Fast Vertical Crowd Counting Via Joint Using Depth and Color Information
17 pages
A Viewpoint Invariant Approach For Crowd Counting: Dan Kong, Doug Gray and Hai Tao
No ratings yet
A Viewpoint Invariant Approach For Crowd Counting: Dan Kong, Doug Gray and Hai Tao
4 pages
Deep Learning For Crowd Counting Addressing Crowd Density With Advanced Methods
No ratings yet
Deep Learning For Crowd Counting Addressing Crowd Density With Advanced Methods
5 pages
Ayush Sahu Report
No ratings yet
Ayush Sahu Report
46 pages
AI-Powered Crowd Counting in CCTV
No ratings yet
AI-Powered Crowd Counting in CCTV
4 pages
3rd Akshat
No ratings yet
3rd Akshat
11 pages
Minor Project Report
No ratings yet
Minor Project Report
60 pages
Locate, Size and Count - Accurately Resolving People in Dense Crowds Via Detection
No ratings yet
Locate, Size and Count - Accurately Resolving People in Dense Crowds Via Detection
12 pages
Single-Image Crowd Counting Via Multi-Column Convolutional Neural Network 2016
No ratings yet
Single-Image Crowd Counting Via Multi-Column Convolutional Neural Network 2016
9 pages
Deep Learning for Crowd Counting
No ratings yet
Deep Learning for Crowd Counting
23 pages
5 BEST Sources
No ratings yet
5 BEST Sources
6 pages
Crowd Size Estimation Using Raspberry Pi
No ratings yet
Crowd Size Estimation Using Raspberry Pi
4 pages
Human Recognition
No ratings yet
Human Recognition
6 pages
GLCM and GLDS Texture Equations
No ratings yet
GLCM and GLDS Texture Equations
26 pages
1 s2.0 S1877050915021754 Main
No ratings yet
1 s2.0 S1877050915021754 Main
9 pages
Learning Scalable Omni Scale Distribu 2025 Journal of Visual Communication A
No ratings yet
Learning Scalable Omni Scale Distribu 2025 Journal of Visual Communication A
11 pages
Crowd Counting in The Frequency Domain
No ratings yet
Crowd Counting in The Frequency Domain
10 pages
The Design and Implementation of A Vision-Based People Counting System in Buses
No ratings yet
The Design and Implementation of A Vision-Based People Counting System in Buses
3 pages
Arandjelovic Crowddetection 2008
No ratings yet
Arandjelovic Crowddetection 2008
12 pages
AECNet: Attentive EfficientNet For Crowd Counting
No ratings yet
AECNet: Attentive EfficientNet For Crowd Counting
8 pages
Deep Learning for People Counting in Sports Videos
No ratings yet
Deep Learning for People Counting in Sports Videos
8 pages
Advances in Convolution Neural Networks Based Crowd Counting and Density Estimation 2021
No ratings yet
Advances in Convolution Neural Networks Based Crowd Counting and Density Estimation 2021
21 pages
Crowd Behaviours Analysis in Dynamic Visual Scenes of Complex Environment - 2008
No ratings yet
Crowd Behaviours Analysis in Dynamic Visual Scenes of Complex Environment - 2008
4 pages
NWPU-Crowd A Large-Scale Benchmark For Crowd Counting and Localization
No ratings yet
NWPU-Crowd A Large-Scale Benchmark For Crowd Counting and Localization
9 pages
Arxiv2020 - Distribution Matching For Crowd Counting
No ratings yet
Arxiv2020 - Distribution Matching For Crowd Counting
12 pages
Crowd Group Detection Method
No ratings yet
Crowd Group Detection Method
13 pages
Distribution Matching For Crowd Counting: Boyu Wang Huidong Liu Dimitris Samaras Minh Hoai
No ratings yet
Distribution Matching For Crowd Counting: Boyu Wang Huidong Liu Dimitris Samaras Minh Hoai
13 pages
People Counting Technology Overview
No ratings yet
People Counting Technology Overview
6 pages
Paper 4
No ratings yet
Paper 4
4 pages
Counting People in The Crowd Using Social Media Images 2021
No ratings yet
Counting People in The Crowd Using Social Media Images 2021
35 pages
Deep Learning for Crowd Counting
No ratings yet
Deep Learning for Crowd Counting
6 pages
1-S2.0-S1877050920315179-Main
No ratings yet
1-S2.0-S1877050920315179-Main
7 pages
The Plights of Pedestrian Crossing With Its Fortifications
No ratings yet
The Plights of Pedestrian Crossing With Its Fortifications
4 pages
Real Time Crowd Monitoring System
No ratings yet
Real Time Crowd Monitoring System
11 pages
Sensors 20 02178
No ratings yet
Sensors 20 02178
18 pages
Estimating The Number of People in Buildings Using Visual Information
No ratings yet
Estimating The Number of People in Buildings Using Visual Information
6 pages
Research Paper
No ratings yet
Research Paper
6 pages
Counting Individuals in An Image Using Machine Learning Technique
No ratings yet
Counting Individuals in An Image Using Machine Learning Technique
5 pages
Computer Vision for Crowd Safety
No ratings yet
Computer Vision for Crowd Safety
8 pages
147 153, Tesma608, IJEAST
No ratings yet
147 153, Tesma608, IJEAST
7 pages
Miniproject
No ratings yet
Miniproject
20 pages
229 PDF
No ratings yet
229 PDF
7 pages
Delussu PHD Thesis
No ratings yet
Delussu PHD Thesis
125 pages
Manuscript
No ratings yet
Manuscript
22 pages
Crowd Density Mapping for Analysis
No ratings yet
Crowd Density Mapping for Analysis
15 pages
NeurIPS 2020 Distribution Matching For Crowd Counting Paper
No ratings yet
NeurIPS 2020 Distribution Matching For Crowd Counting Paper
13 pages
Linear Integrated Circuits Lab Report
No ratings yet
Linear Integrated Circuits Lab Report
192 pages
Clear and Confident Pronunciation Challenge Workbook
No ratings yet
Clear and Confident Pronunciation Challenge Workbook
17 pages
HR Internship Insights
No ratings yet
HR Internship Insights
14 pages
Petition To Receive Document
50% (2)
Petition To Receive Document
3 pages
Round 2
No ratings yet
Round 2
20 pages
Long Face Syndrome
100% (3)
Long Face Syndrome
144 pages
295 422 Dayton Audio Ds90 8 Specifications
No ratings yet
295 422 Dayton Audio Ds90 8 Specifications
1 page
MOCK MCQ TEST PRACTICE 1 Class 1, English Language, 13-10-21 (9 To 10 Am) Marks 25
No ratings yet
MOCK MCQ TEST PRACTICE 1 Class 1, English Language, 13-10-21 (9 To 10 Am) Marks 25
8 pages
1 - HRM Assignment Final
No ratings yet
1 - HRM Assignment Final
9 pages
Revision Questions L1 Banking Operations and Techniques
No ratings yet
Revision Questions L1 Banking Operations and Techniques
9 pages
Welcome Song: It's a Great Thing
No ratings yet
Welcome Song: It's a Great Thing
8 pages
7.1.12.student Handbook
No ratings yet
7.1.12.student Handbook
26 pages
Mtrix New Catalogue
No ratings yet
Mtrix New Catalogue
4 pages
A Review of Polymers and Plastic High Index Optical Materials
No ratings yet
A Review of Polymers and Plastic High Index Optical Materials
15 pages
Bridge Rectifier Power Supply Guide
No ratings yet
Bridge Rectifier Power Supply Guide
3 pages
Guide To Cleanroom Gowning
No ratings yet
Guide To Cleanroom Gowning
1 page
Arts & Crafts Education Insights
No ratings yet
Arts & Crafts Education Insights
3 pages
Getting Started With Arduino Gives You Lots of
No ratings yet
Getting Started With Arduino Gives You Lots of
1 page
Grade 6 Characterization Project
100% (1)
Grade 6 Characterization Project
3 pages
8-Introduction To Timber Design
No ratings yet
8-Introduction To Timber Design
36 pages
Sangam Age
No ratings yet
Sangam Age
87 pages
9280 R QP InternationalEnglishasaSecondLanguage G 1nov22!07!00 GMT
No ratings yet
9280 R QP InternationalEnglishasaSecondLanguage G 1nov22!07!00 GMT
24 pages
Meet The Administrative Team Talley Administrative Team
No ratings yet
Meet The Administrative Team Talley Administrative Team
1 page
Analisis Kesalahan Terjemahan Kitab Ta'lim
No ratings yet
Analisis Kesalahan Terjemahan Kitab Ta'lim
16 pages
Vehicle Inspection Guide
No ratings yet
Vehicle Inspection Guide
4 pages
Extraction of Eugenol From Cloves Using An Unmodi Fied Household Espresso Machine: An Alternative To Traditional Steam-Distillation
No ratings yet
Extraction of Eugenol From Cloves Using An Unmodi Fied Household Espresso Machine: An Alternative To Traditional Steam-Distillation
4 pages
GRADES 1 To 12 Daily Lesson Log: I. Objectives
No ratings yet
GRADES 1 To 12 Daily Lesson Log: I. Objectives
4 pages
01-13 Rrijm20240903001
No ratings yet
01-13 Rrijm20240903001
13 pages
People v. Panlilio, G.R. Nos. 113519-20 Case Digest (Criminal Procedure)
67% (3)
People v. Panlilio, G.R. Nos. 113519-20 Case Digest (Criminal Procedure)
4 pages
30x113mm HEDP 1
No ratings yet
30x113mm HEDP 1
2 pages

Crowd Counting Using Multiple Local Features

Uploaded by

Crowd Counting Using Multiple Local Features

Uploaded by

2009 Digital Image Computing: Techniques and Applications

Crowd Counting using Multiple Local Features

David Ryan, Simon Denman, Clinton Fookes, Sridha Sridharan

978-0-7695-3866-2/09 $26.00 © 2009 IEEE 74

IV. A LGORITHM T RAINING AND G ROUND T RUTH

System Classifier Raw Estimate Median Filtered

Proposed System using NN Kong’s System using NN Holistic System using NN

System Training Raw Estimate Median Filtered

Downscaling Testing Results Upscaling Testing Results

Figure 6. Downscaling and upscaling testing results (Linear Classifier Only).

You might also like