0% found this document useful (0 votes)
43 views4 pages

Icct 2015 7399860

This document summarizes a research paper on accelerating background subtraction using GPU parallel processing. It describes the ViBe background subtraction algorithm and its implementation using CUDA for parallel processing on a GPU. The ViBe algorithm models each pixel as a set of samples and classifies pixels as background if a sufficient number of samples fall within a radius of the current pixel value. The implementation achieves faster processing speeds compared to earlier algorithms, making real-time background subtraction possible. Key aspects include modeling each pixel independently, initializing models from neighborhood pixels, and randomly updating models over time for improved accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views4 pages

Icct 2015 7399860

This document summarizes a research paper on accelerating background subtraction using GPU parallel processing. It describes the ViBe background subtraction algorithm and its implementation using CUDA for parallel processing on a GPU. The ViBe algorithm models each pixel as a set of samples and classifies pixels as background if a sufficient number of samples fall within a radius of the current pixel value. The implementation achieves faster processing speeds compared to earlier algorithms, making real-time background subtraction possible. Key aspects include modeling each pixel independently, initializing models from neighborhood pixels, and randomly updating models over time for improved accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Proceedings ojlCCT2015

GPU Accelerated Background Subtraction

DanLiu
Science and Technology on Communication Security Laboratory Chengdu, China.
E-mail: [email protected]

Abstract: Background subtraction is a crucial step in many Background subtraction techniques always have three parts: (1)
automatic video content analysis applications. While model principle ( 2) model initialization (3) model update.
numerous acceptable techniques have been proposed so far for These parts are given in the three subsections of this section.
background extraction, there is still a need to produce more
efficient algorithms in terms of computation efficiency. Some
complex algorithms usually give better results, but are too • ,

slow to be applied to real-time systems. This paper describes


.,
PI
"'

ViBe algorithm for background subtraction and presents


\
\.
implementation of it using the CUDA technology for parallel
\

• \

processing. The experiments indicate that our implementation


PJ i
!
outperforms the majority of earlier state-of-the-art algorith ms
I
i
not only in terms of accuracy, but also in terms of processing
,.
i

speed. •
Po

Keywords: Background subtraction; CUDA ; Parallel


computing; Computer vision; GPU
P6

CI

1 Introductio n Figure 1 ViBe algorithm Pixel classification mechanism

Background subtraction is a process to separate the foreground 2.1 Model Principle


objects from the background of a video [1]. The foreground
Assumept(x) is the value of the pixel x at time t . To
objects are defined as the parts of the video frame that changes
and the background is made out of the pixels that stay classifyPL (X), we count the number of samples contained in
relatively constant. Examp Ie applications utilizing background
the sphere of radius R around pt(X).A pixel value is then
subtraction include video surveillance, industry vision, and
patient monitoring systems. In real-life scenarios, however, classified as background if the cardinality, denoted tt, of the
there are many factors can affect the video: illumination change, set intersection of this sphere and the set of samples
moving objects in the background, camera moving. ill the past
years, many background subtraction algorithms have been
{PI' P2' • • • , P n } (see Figure 1) is above a given threshold

proposed [2]. The detection accuracy remains the researcher's ttmin. More formally, we compare ttmin to
main focus [3], but even though it can be improved, it is done
at the expense of increased computational power.
tt {SR(Pt(X)) n {PI' P2' • • • , pJ} (1)

Two parameters determin ing the accuracy of our model are the
Graphics Processing Units (GPUs) have evolved to support
general purpose applications from different market domains [4], radius R of the sphere and the minimal cardinality ttmin•
suitable for data-intensive applications with high-throughput Experiments have shown that a unique radius R and a
requirements and single instruction multiple data (SIMD) styIe cardinality of 2 offer excellent performances. There is no need
parallelism Since Background subtraction is an embarrassingly to adapt these parameters during the background subtraction
parallel application, individual pixel are processed nor do we need to change them for different pixel locations
independently which is a suitable candidate for GPU mapping. within the image. Note that since the number 0 f samples n and
In this paper, we first discuss a novel samples-based ttmin are chosen to be fixed and impact on the same decision,
background subtraction algorithm called ViBe [5][6] (for
"VIsual Background Extractor") which produces good results
tt
the sensitivity of model can be adjusted using the - ratio.
in various environments. We present a fast processing version n
ofYIBe using Compute Unified Device Architecture (CUDA).
2.2 Model Initialization
Although the model could easily recover from any type of
2 ViBe: Background Subtraction Algorithm initialization, for example by choosing a set of random values,
978-1-4673-7005-9 /15/$31.00 ©2015 IEEE

372
Proceedings ojlCCT2015

it is convenient to get an accurate background estimate as soon large data sets can use a data-parallel programming model to
as possible. We populate the pixel models with values found speed up the computations [7].
in the spatial neighborhood of each pixel. More precisely, we
fill them with values randomly taken in their neighborhood on
the frrst frame. From our experiments, selecting samples
randomly in the 8-connected neighborhood of each pixel has
proved to be satisfactory for images of 640 x 480 pixels. This
strategy proved to be successful: the background estimate IS
valid from the second frame.

2.3 Model Update


In this Section, we describe how to continuously update the
background model with each new frame. To achieve accurate
results over time and to handle new objects that appear in a
scene, the model has to be updated regularly.

Time subsampling: In order to extend the size of the time


window even more, we resort to random time subsampling.
The main idea is that background is varying slowly, except for
the cases of a sudden global illumination change or scene cuts
that need a proper handling at the frame level. Therefore, it is
Figure 2 eUDA thread layout

not necessary to update each background model for each new 3.2 CUDA Basics on GPU
frame. We manage to do this by choosing, randomly, which
eUDA programming model provides a simp Ie interface to
sample to replace when updating a pixel model.
program on GPu. eUDA architecture is massively parallel in
Mathematically, one shows that, according to this updating
that a kernel is executed by thousands of threads [8]. Threads
mechanism, the probability for a pixel sample present a time
are organized in one way that it is easy for the device to
( n: l)tl-to launch and maintain under control. From the Figure 2 we can
to to be still present at a later time t1 is identiJY the thread hierarchy. First it is the Grid a two­
dimensional parameter containing blocks inside. Than we
Spatial diffusion: To ensure the spatial consistency of the have Blocks, a three-dimensional parameter in which we have
whole image model and handle practical situations such as a defmed number of threads. An entire grid is handled by a
small camera movements or slowly evolving background single GPU chip. The GPU chip is organized as a collection of
objects, we adopt a technique similar to that developed for the multiprocessOls, with each multiprocessor responsible for
updating process in wh ich we choose at random and update a handling one or more blocks in a grid. A block is never
pixel model in the neighborhood of the current pixel. Let us divided across multiple multiprocessors. Each multiprocessor
is further divided into a number of stream processors, with
denote by ( )
N(; x and { )
p x respectively the spatial
each stream processors handling one or more threads in a
neighbOlhood of a pixel x and its value. Assume that it was block. Finally, the Threads themselves where we put the tasks
decided to update the set of samples of x by inserting p x . { ) we want to launch.

{ )
Then we also use this value p x to update the set of samples
4 Implementation on GPU Using eUDA
of one of the pixels in the neighborhood N(; x , chosen at ( ) In GPU implementations of background subtraction algorithm,
random.
each pixel is processed independently from each other, wh ich
can be parallelized by devoting a thread for each pixel. The
3 GPU and eUDA Basics number of pixels for each thread is automatically determined
by the algorithm right before the frrst frame is processed. When
3.1 Graphics Processing Unit a new frame arrives, the algorithm is run and the background
GPU is specialized for compute-intensive, highly parallel model is updated. Pixels in output image are classified as either
computation and therefore designed such that more transistOls foreground or background depending on their intensity value.
are devoted to data processing rather than data caching and High-level data flow of the eUDA implementation is given in
flow control. More specifically, the GPU is especially well­ Figure 3. The input, output frames and the background model
suited to address problems that can be expressed as data­ are stored in global memory, while the configuration
parallel computations. Because the same program is executed parameters are stored in constant memory to ensure fast access
for each data element, there is a lower requirement for from threads. Although shared memory is much faster than
sophisticated flow control. Many applications that process global memory, its capacity is not enough to store input and
output data for all threads in the block.

373
Proceedings ojlCCT2015

CPU (Host) GPU (Deyice)


----T---.- ime
Input
Video Frames
- -
.. Input video frame Kemel execution Process frame I - 1

Ou:'ll!,.. • Output frame

FOre!lfOtllld A.' Copy stream


pi�els

Global memOlY

Shared memory j: ffIIJ


Tlu'eads
I
mrn
----I""
I Per-block panuneters
I
rnrnmrn
I
Per-block parameters

��
----- --------------------
;�
COllstant memory ---+ Figure 4 Timeline for asynchronous execution
Algorithm parameters

Table I Comparison of Frame Rate


Image pixels CPU GPU Speed-up

320x240 ISSfps 914 fps S.8X


Figure 3 Data Flow of the CUDA Implementation

27 fps 182 fps 6.7X


4.1 Memory Coalescing 768xS76

The most important perfonnance consideration in


ISOOx800 10 fps 42 iPs 4.2X
programming for CUDA architecture is the coalescing of
global memory access. Global memory loads and stores by
threads of a warp are coalesced by the device into as few as one
transaction when certain access requirements are met [ 9]. In
order to take advantage of CUDA memory coalescing, we
convert input images into four channels (RGBA) mode, wh ich
uses 4 bytes for each pixel. By this way, each thread can access
its pixels using uchar4 structures. In our implementation, a
single 128-byte Lltransaction will service that memory access.

4.2 Asynchronous Executionl


Page-locked or pinned memory transfers attain the highest
bandwidth between the host and the device, but excessive use Figure 5 CPU background subtraction result
can reduce overall system performance because pinned
memory is a scarce resource. A common optim ization for GPU
is overlapping data transfer and kernel execution [1 0] [11] [ 1 2].
Data transfers between the host and the device using
cudaMemcpyAsyncO function is a non-blocking variant of
cudaMemcpyO in which control is returned immed iately to the
host thread. Using asynchronous transfer and pinned memory,
we are able to interleave the CPU code execution w ith the
kernel launches and memory transfers as described in Figure 4.
While the kernel is busy with processing image data in the first
Figure 6 GPU background subtraction result
buffer, the copy stream keeps transferring the next frame into
the second buffer. When the kernel fmishes its processing, the
The CPU imp Iementation perfonns pretty wel l on the low­
buffer pointers are swapped and the kernel is launched again.
resolution sequences, but for sequences larger than 768x S76,
5 Experimental Results its frame rates are lower than 30 fps. We have shown how the
algorithm can be implemented on a GPU wh ile achieving a
In our GPU imp lementation to analyze the performance of the higher processing rate. Figure S and Figure 6 show that the
algorithm we used three video sequences 320x240 pixels, 7 68 GPU implementation has nearly the same robustness with the
original imp lementation. It can be concluded that our proposed
xS76 pixels and IS0 0x800 pixels f rom PETS 2009 dataset [13]. implementation not only achieves real-time performance, but
We compared the experimental res ults with the results of also improve accuracy of background subtraction. We have
original a lgorithm [6 ]. Al l Experiments are performed on a observed that Data Parallel Architectures can help in increasing
system running Dual Core, 2.79 GHz CPU (4 GB of RAM) and the performance of this algorithm significantly.
NVIDIA K20 GPU.

371
Proceedings ojlCCT2015

References [7] J. Fung, "Advances in GPU-based Image Processing and


Computer Vision," in STGGRAPH,2009.
[I] A. McIvor, "Background subtraction techniques," in Proc. of [8] NVTDTA Corporation, "CUDA C Best Practices Guide," 2010.
Image and Vision Computing, (Auckland, New Zealand), [9] NVlDIA Corporation, "NVlDIA CUDA Programming Guide,"
November 2000. 2007.
[2] M. Piccardi, "Background subtraction techniques: a review," in [10] P. Kumar, A. Singhal, S. Mehta, and A. Mittal, "Real-time
Proceedings of the IEEE International Conference on Systems, moving object detection algorithm on high-r esolution videos
Man and Cybernetics, 2004, vol. 4, pp. 3099-3104. using gpus," Journal of Real-Time Image Processing, pp. 1-17,
[3] R. Radke, S. Andra, O. AI-Kofahi, and B. Roysam, "Image 2013. [Online]. Available: http://dx.doiorgllO.1007/s115 5 4 -012-
change detection a l gorithms: A systematic survey," IEEE 0 3 0 9-y
Transactions on Image Processing, vol. 14, pp. 294-307, March [II] Y. Li, G. Wan g, and X. Lin, "Three-level GPU accelerated
2005. Gaussian mixture model for background subtraction," in Society
[4] A. R. Brodtkorb, C. Dyken, T. R. Hagen, J. M. Hjelmervik, and of Photo-Optical Instrumentation Engineers (SPIE) Conference
O. O. StoraasIi, "State-of-the-art in heterogeneous computing," Series, ser. Society of Photo- Optical Instrumentation Engineers
Sci. Program., vo I. 18, no. 1, pp. 1-33, Jan. 2010. [Online]. (SPIE) Conference Series, vol. 8295, Feb. 2012.
Available: http://dl.acm.orgicitation.cfm?id= 1804799.1804800 [12] V. Pham, P. Vo, V. T. Hun g, and L. H. Bac, "Gpu
[5] O. Barnich and M. Van Droogenbroeck. ViBe: A universal implementation of extended gaussian mixture model for
background subtraction algorithm for v ideo sequences. IEEE background subtraction," in Computing and Communication
Transactions on Image Processing, 20(6):1709-1724, June 2011. Technologies, Research, Innovation, and Vision for the Future
(RlVF), 2010 IEE E RlVF International Conference on, 2010, pp.
[6] O. Barnich and M. Van Droogenbroeck. ViBe: a powerful
1-4.
random technique to estimate the background in v ideo
sequences. In IEEE International Conference on Acoustics, [13] "PETS 2009 Benchmark Data," 2009. [Online]. Available:
Speech and S ignal Processin g (ICAS SP), pag es 945-948, April http://www.cvg.rdg.ac.uk/PETS2009/a.html
2009.

375

You might also like