Icct 2015 7399860
Icct 2015 7399860
DanLiu
Science and Technology on Communication Security Laboratory Chengdu, China.
E-mail: [email protected]
Abstract: Background subtraction is a crucial step in many Background subtraction techniques always have three parts: (1)
automatic video content analysis applications. While model principle ( 2) model initialization (3) model update.
numerous acceptable techniques have been proposed so far for These parts are given in the three subsections of this section.
background extraction, there is still a need to produce more
efficient algorithms in terms of computation efficiency. Some
complex algorithms usually give better results, but are too • ,
• \
speed. •
Po
CI
proposed [2]. The detection accuracy remains the researcher's ttmin. More formally, we compare ttmin to
main focus [3], but even though it can be improved, it is done
at the expense of increased computational power.
tt {SR(Pt(X)) n {PI' P2' • • • , pJ} (1)
Two parameters determin ing the accuracy of our model are the
Graphics Processing Units (GPUs) have evolved to support
general purpose applications from different market domains [4], radius R of the sphere and the minimal cardinality ttmin•
suitable for data-intensive applications with high-throughput Experiments have shown that a unique radius R and a
requirements and single instruction multiple data (SIMD) styIe cardinality of 2 offer excellent performances. There is no need
parallelism Since Background subtraction is an embarrassingly to adapt these parameters during the background subtraction
parallel application, individual pixel are processed nor do we need to change them for different pixel locations
independently which is a suitable candidate for GPU mapping. within the image. Note that since the number 0 f samples n and
In this paper, we first discuss a novel samples-based ttmin are chosen to be fixed and impact on the same decision,
background subtraction algorithm called ViBe [5][6] (for
"VIsual Background Extractor") which produces good results
tt
the sensitivity of model can be adjusted using the - ratio.
in various environments. We present a fast processing version n
ofYIBe using Compute Unified Device Architecture (CUDA).
2.2 Model Initialization
Although the model could easily recover from any type of
2 ViBe: Background Subtraction Algorithm initialization, for example by choosing a set of random values,
978-1-4673-7005-9 /15/$31.00 ©2015 IEEE
372
Proceedings ojlCCT2015
it is convenient to get an accurate background estimate as soon large data sets can use a data-parallel programming model to
as possible. We populate the pixel models with values found speed up the computations [7].
in the spatial neighborhood of each pixel. More precisely, we
fill them with values randomly taken in their neighborhood on
the frrst frame. From our experiments, selecting samples
randomly in the 8-connected neighborhood of each pixel has
proved to be satisfactory for images of 640 x 480 pixels. This
strategy proved to be successful: the background estimate IS
valid from the second frame.
not necessary to update each background model for each new 3.2 CUDA Basics on GPU
frame. We manage to do this by choosing, randomly, which
eUDA programming model provides a simp Ie interface to
sample to replace when updating a pixel model.
program on GPu. eUDA architecture is massively parallel in
Mathematically, one shows that, according to this updating
that a kernel is executed by thousands of threads [8]. Threads
mechanism, the probability for a pixel sample present a time
are organized in one way that it is easy for the device to
( n: l)tl-to launch and maintain under control. From the Figure 2 we can
to to be still present at a later time t1 is identiJY the thread hierarchy. First it is the Grid a two
dimensional parameter containing blocks inside. Than we
Spatial diffusion: To ensure the spatial consistency of the have Blocks, a three-dimensional parameter in which we have
whole image model and handle practical situations such as a defmed number of threads. An entire grid is handled by a
small camera movements or slowly evolving background single GPU chip. The GPU chip is organized as a collection of
objects, we adopt a technique similar to that developed for the multiprocessOls, with each multiprocessor responsible for
updating process in wh ich we choose at random and update a handling one or more blocks in a grid. A block is never
pixel model in the neighborhood of the current pixel. Let us divided across multiple multiprocessors. Each multiprocessor
is further divided into a number of stream processors, with
denote by ( )
N(; x and { )
p x respectively the spatial
each stream processors handling one or more threads in a
neighbOlhood of a pixel x and its value. Assume that it was block. Finally, the Threads themselves where we put the tasks
decided to update the set of samples of x by inserting p x . { ) we want to launch.
{ )
Then we also use this value p x to update the set of samples
4 Implementation on GPU Using eUDA
of one of the pixels in the neighborhood N(; x , chosen at ( ) In GPU implementations of background subtraction algorithm,
random.
each pixel is processed independently from each other, wh ich
can be parallelized by devoting a thread for each pixel. The
3 GPU and eUDA Basics number of pixels for each thread is automatically determined
by the algorithm right before the frrst frame is processed. When
3.1 Graphics Processing Unit a new frame arrives, the algorithm is run and the background
GPU is specialized for compute-intensive, highly parallel model is updated. Pixels in output image are classified as either
computation and therefore designed such that more transistOls foreground or background depending on their intensity value.
are devoted to data processing rather than data caching and High-level data flow of the eUDA implementation is given in
flow control. More specifically, the GPU is especially well Figure 3. The input, output frames and the background model
suited to address problems that can be expressed as data are stored in global memory, while the configuration
parallel computations. Because the same program is executed parameters are stored in constant memory to ensure fast access
for each data element, there is a lower requirement for from threads. Although shared memory is much faster than
sophisticated flow control. Many applications that process global memory, its capacity is not enough to store input and
output data for all threads in the block.
373
Proceedings ojlCCT2015
Global memOlY
��
----- --------------------
;�
COllstant memory ---+ Figure 4 Timeline for asynchronous execution
Algorithm parameters
371
Proceedings ojlCCT2015
375