0% found this document useful (0 votes)

133 views8 pages

Implementation of The CA-CFAR Algorithm For Pulsed

Constant false alarm rate

Uploaded by

Tanuja Tanu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

133 views8 pages

Implementation of The CA-CFAR Algorithm For Pulsed

Constant false alarm rate

Uploaded by

Tanuja Tanu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

See discussions, stats, and author profiles for this publication at: [Link]

net/publication/229034537

Implementation of the CA-CFAR algorithm for pulsed-Doppler radar on a

GPU architecture

Article · January 2012

DOI: 10.1109/AEECT.2011.6132514

CITATIONS READS

10 3,198

6 authors, including:

Chris Venter Kaltham Almalki

CSIR Qatar University
2 PUBLICATIONS 11 CITATIONS 1 PUBLICATION 10 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Hans Grobler on 06 June 2014.

The user has requested enhancement of the downloaded file.

Accepted version of: C.J. Venter, H. Grobler and K.A. AlMalki,
“Implementation of the CA-CFAR Algorithm for Pulsed-Doppler Radar on
a GPU Architecture,” 2011 IEEE Jordan Conference on Applied Electrical
Engineering and Computing Technologies, 6-8 December 2011, pp. 233-
238. Published version available online:
[Link]

©2011 IEEE. Personal use of this material is permitted. Permission from

IEEE must be obtained for all other uses, in any current or future media,
including reprinting/republishing this material for advertising or
promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of
this work in other works.
2011 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)

Implementation of the CA-CFAR Algorithm for

Pulsed-Doppler Radar on a GPU Architecture
C.J. Venter1,2 , H. Grobler2 and K.A. AlMalki3
Email: cventer@[Link], [Link]@[Link], kalmalki@[Link]
1
Defence, Peace, Safety and Security, Council for Scientific and Industrial Research, South Africa
2
Department of Electrical, Electronic and Computer Engineering, University of Pretoria, South Africa
3
Electronics, Communications and Photonics, King Abdulaziz City for Science and Technology, Kingdom of Saudi Arabia

Abstract — The Cell-Averaging Constant False-Alarm Rate An algorithmic optimization of the Cell-Averaging CFAR
(CA-CFAR) algorithm was implemented and optimized in soft- (CA-CFAR) algorithm was performed in an attempt to find
ware on the NVIDIA Tesla C1060 GPU architecture for appli- an efficient algorithmic structure for software implementation
cation in pulsed-Doppler radar signal processors. A systematic
approach was followed to gradually explore opportunities for on the GPU architecture. Three techniques were identified,
parallel execution and optimization by implementing the algo- implemented, optimized and evaluated using a systematic
rithm first in MATLAB (CPU), followed by native C (CPU) and approach to move from high-level computing languages to the
finally NVIDIA CUDA (GPU) environments. Three techniques target GPU architecture. The GPU results were compared to
for implementing the CA-CFAR in software were identified results of our own implementations in native C, as well as
and implemented, namely a naı̈ve technique, sliding window
technique and a new variant which employs the Summed-Area another CA-CFAR software reference implementation.
Table (SAT) algorithm. The naı̈ve technique performed best The results shows that, in terms of kernel throughput and
on the GPU architecture. The SAT technique shows potential, latency, the GPU implementations generally perform better
especially for cases where very large CFAR windows are required. than comparative CPU implementations for larger input sizes.
However, the results do not justify using the GPU architecture However, in terms of total throughput and latency, which
instead of the CPU architecture for this application when data
transfer to and from the GPU is taken into consideration. includes data transfer to and from the GPU, the performance
results do not justify using the GPU architecture for the range
Keywords — CFAR; GPU; Doppler; radar; signal processing
of input sizes that were evaluated.
I. I NTRODUCTION II. CA-CFAR FOR P ULSED -D OPPLER R ADAR
A Constant False-Alarm Rate (CFAR) detector in a radar A 2D data matrix is generally used to represent the sampled
system detects targets in varying interference by adjusting data for pulsed-Doppler radar systems with a single receiver
detection thresholds dynamically. The computational workload front-end element. The 2D data matrix is called a burst and
of calculating the interference estimate for every radar data it has range and Doppler dimensions. The range dimension
sample in order to adjust the detection threshold can be contains “fast-time” samples of the echo of a single received
very high. High throughput and low latency is required for pulse. The Doppler dimension contains “slow-time” samples
radar applications, where the latter is especially important in which represent the echo received from successive transmitted
tracking systems which need to maintain a tight tracking loop. pulses in the burst.
CFAR processing in modern radar signal processors is A Constant False-Alarm Rate (CFAR) detector adjusts the
typically performed with Field-Programmable Gate Arrays detector threshold in real-time in order to maintain a constant,
(FPGAs) or Digital Signal Processors (DSPs) due to the design-specific false-alarm rate with corresponding probability
processing and I/O performance that they provide. Current of Pf a . The threshold is determined by estimating the interfer-
generation many-core GPUs can deliver massive computa- ence around the target. A Cell-Averaging CFAR (CA-CFAR)
tional performance on a single chip. The memory bandwidth detector estimates the interference power for a Cell Under Test
between a GPU chip and its own high-speed memory is also (CUT) by averaging the sample values of adjacent cells, under
very high as a result of very wide data bus widths and high the assumption that interference statistics are homogeneous in
memory clock rates. Although GPUs were originally designed a localized area.
to perform graphics processing, recent generations make pro- A. CA-CFAR Algorithm
vision for General Purpose Computing on the GPU (GPGPU). The maximum likelihood estimate for the interference
GPUs could therefore potentially be used to perform CFAR power, βc2 is obtained from the average of N samples in the
processing in pulsed-Doppler radar systems. vicinity of the CUT [1] as shown in equation 1.
The authors would like to express their gratitude to the National Program N
c2 = 1
X
for Electronics, Communications and Photonics at the King Abdulaziz City β xi (1)
for Science and Technology for supporting this work. N i=1

978-1-4577-1084-1/11/$26.00 ©2011 IEEE

2011 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)
P
c2
A CFAR window is the set of N cells around the CUT that β CU T = Ncf arW in = CFAR window cells = Guard cells
are included in the estimation of the interference power. The
4
CFAR window excludes a set of guard cells in the immediate
vicinity of the CUT, since energy from a target in the CUT 3
might occupy multiple cells around the CUT as well. In

Doppler bins
this case the energy in the cells immediately adjacent to the 2 CUT
CUT do not represent interference alone and will cause the
detector threshold to be raised artificially if included in the 1

calculation. The threshold, Tb is then calculated as a function

0
of the estimated interference power and the desired false-alarm
probability P f a [1] as shown in equation 2 and 3. 0 1 2 3 4 5 6 7 8 9 10
Range bins

c2
Tb = αβ (2) c2
Figure 1. Calculation of β CU T from CFAR window for naı̈ve technique.
−1/N
α= N (P f a − 1) (3)
A. Naı̈ve Technique
B. Existing CA-CFAR Software Implementations
One of simplest and most intuitive ways of implementing
The High Performance Embedded Computing (HPEC) the CA-CFAR is to simply sum the values for all cells in the
Challenge Benchmark Suite [2] includes a CFAR benchmark CFAR window for each CUT given by
for CPUs. The assessment in [3] compared their CA-CFAR Ncf arW in
GPU implementation to the HPEC CFAR benchmark CPU c2 (i, j) = 1 X
β CW (i, j, k), (4)
implementation. The implementation in [3] favours parallelism Ncf arW in
k=1
by minimizing communication and synchronization between
elements. The problem effectively turns into an embarrassingly where CW (i, j, k) is the kth CFAR window cell of cell (i, j)
parallel problem using this approach. as shown in figure 1. We refer to this technique as the naı̈ve
technique due to the potential drawbacks associated with using
A speedup factor of between 2.6 and 166 was achieved
such a simple, brute-force approach. Redundant summation
with the GPU implementation in [3] over the HPEC CPU
will be performed, since there is a large overlap in the CFAR
reference implementation [4]. The strict kernel latency metric,
window cells for adjacent cells under test in the range-Doppler
as described in the HPEC Challenge benchmark guidelines
map. The computational complexity increases as the dimen-
[2], is used. The conclusion in [3] is that the overall run time
sions of either the input data or the CFAR window increases.
is limited by global memory bandwidth.
The naı̈ve technique provides an embarrassingly parallel struc-
The HPEC Challenge benchmark code for CPUs is publicly ture which means that each cell can be evaluated independently
available, which allowed us to compile and run the code on without requiring synchronization with any other cells.
the same platforms that were used to obtain our own results. The time complexity for the naı̈ve technique with a range-
No code is available for the other implementations that were Doppler map of Nr range bins by Nd Doppler bins and with
found and we could therefore not do comparisons with these leading and lagging CFAR windows of Nrwin by Ndwin cells
implementations on the target hardware platforms. each, is given by
O(Nrd × Ncf arW in ), (5)
III. H IGH -L EVEL A LGORITHMIC O PTIMIZATION
where Nrd = Nr × Nd , Ncf arW in = 2 × Nrwin × Ndwin .
Performance optimization is typically performed on differ-
ent levels ranging from high-level algorithmic optimization, to B. Sliding Window Technique
medium-level implementation optimization, down to low-level The CA-CFAR can also be implemented in software using
instruction and other other architecture-specific optimizations. a sliding window technique, which capitalizes on the redun-
The structure of an algorithm that is implemented in software dancy in the calculation of the local noise estimate for adjacent
can have a major impact its performance, based on how cells [2] as given by
well-suited the chosen structure is for the target hardware c2 (i + 1, j) = β
c2 (i, j) + ∆,
β (6)
architecture.
A process of optimizing the algorithmic structure of the where ∆ is calculated as shown in figure 2. It is more
CA-CFAR algorithm for software implementation on the GPU computationally efficient to slide the CFAR window by one
architecture was followed. The aim was to optimize the cell at a time and only account for the difference in the local
computational performance on the GPU architecture in order interference estimate, instead of naı̈vely performing redundant
to increase throughput and reduce latency. Three techniques summation. Computational complexity is reduced at the cost
with different algorithmic structures were identified, namely of reducing the parallelism of the algorithm, as a result of the
naı̈ve, sliding window and Summed-Area Table (SAT). inherent data dependencies that are created.
2011 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)
[B+D−A−C]
∆= Ncf arW in . = Old CFAR window = New CFAR window IV. L OW-L EVEL O PTIMIZATION FOR GPU A RCHITECTURE
4 Following the high-level algorithmic optimization of the
CA-CFAR algorithm, additional lower-level optimizations
3
were performed for the target GPU architecture. The opti-
Doppler bins

2 mizations that were made are based on known techniques

in GPGPU and computer vision fields for improving perfor-
1 mance.
0 A B C D A. Precision
0 1 2 3 4 5 6 7 8 9 10 11
Range bins
FPGA and DSP implementations typically use 16-bit or
32-bit fixed-point representation for sample values. Single-
Figure 2. Calculation of ∆ when sliding in range dimension using sliding precision floats were used instead for the GPU implementa-
window technique. tions. On the target GPU architecture that was used, the single-
precision floating point performance for typical operations
c2 [(B1 +C1 −A1 −D1 )+(B2 +C2 −A2 −D2 )]
β CU T = Ncf arW in exceed performance that is obtained when using integer or
double-precision formats. Single-precision floats also provide
4
sufficient resolution for the application under consideration.
3 A1 B1 A2 B2 B. Apron Generation
Doppler bins

2 CUT When Doppler ambiguities are present, the cells around the
edges of the Doppler dimension of a range-Doppler map are
1 adjacent in terms of velocity. As a result, CFAR algorithms
often implement wrapping instead of clipping of the CFAR
0 C1 D1 C2 D2
window in the Doppler dimension around the edges of the
0 1 2 3 4 5 6 7 8 9 10 range-Doppler map. Given that GPU performance can degrade
Range bins
when conditional branches diverge, the need to check for
c2 all cases where wrapping can occur is a potential problem.
Figure 3. Calculation of β CU T using Summed-Area Table generated from
range-Doppler map. A standard technique used for GPUs to avoid branching in
similar situations, is to pad an image with an apron of pixels.
For all GPU implementations an apron of redundant cells is
The time complexity for the sliding window technique with replicated around the edge of the range-Doppler map, which
a range-Doppler map of Nr range bins by Nd Doppler bins mirrors the opposite edge, to allow for wrapping without
and with leading and lagging CFAR windows of Nrwin by divergent branching.
Ndwin cells each, is given by
p C. Shared Memory Implementation
O(Nrd × Ncf arW in ), (7) The target GPU architecture provides a large off-chip
where Nrd = Nr × Nd , Ncf arW in = 2 × Nrwin × Ndwin . memory which is referred to as device memory or global
memory. A small amount of on-chip shared memory is also
C. Summed-Area Table Technique available which needs to be explicitly loaded with data by the
A Summed-Area Table (SAT) [5] or Integral Image [6], once programmer. The shared memory is shared by and limited to
generated, can be used to obtain the sum of values inside any the scope of the processor cores in the particular symmetric
sub grid of the original input data matrix by doing lookup and multiprocessor unit. Once data has been loaded into shared
summation of 4 data elements. This technique can be applied memory, the read latency is typically in the order of 100 times
to CA-CFAR by generating a SAT from the input range- less than reading from global memory.
Doppler map sample values. Once generated, the SAT can The basic naı̈ve implementation reads all cells in the CFAR
be used to compute the local interference estimate efficiently, window of a particular CUT from global memory. A variation
by optimizing the summation calculation as shown in figure 3. of the naı̈ve technique was also implemented which uses
The summation is performed in constant time, regardless of shared memory to reduce the read latency for redundant
the size of the CFAR window. No evidence was found of this data reads. The shared memory implementation also segments
technique being used for CFAR in radar applications. the summation into two stages where rows and columns are
The time complexity for the SAT technique with a range- summed separately, allowing for more optimal use of the
Doppler map of Nr range bins by Nd Doppler bins and an limited shared memory that is available on the target GPU.
arbitrary CFAR window size, is given by With this two-stage process the overlap only increases linearly
with increases in the range and Doppler dimensions of the
O(Nrd ), (8)
CFAR window, instead of in two dimensions simultaneously
where Nrd = Nr × Nd . as with the global memory implementation.
2011 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)

The shared memory implementation has a limitation in The profiler utilizes hardware counters that are built into the
terms of the maximum CFAR window size as a result of the NVIDIA chip sets for debugging and profiling purposes. The
limited amount of shared memory available on the target GPU. GPU kernels were profiled and analyzed during development
The maximum CFAR window size in both range and Doppler using this profiler.
dimensions is limited to a maximum of 33 cells. CUDA also provides access to hardware timers via GPU
events in the API. These events can be used do accurate timing
V. I MPLEMENTATION of GPU code relative to the internal GPU clock. These GPU
The optimization of the CA-CFAR algorithm was an iter- events were utilized to obtain all the timing information for
ative process which included implementation and evaluation the results.
cycles. The process, environment and methods that were used All implementations process a single burst of synthetic radar
to develop, verify and benchmark the implementations is data which is placed in host (CPU) memory before timing
discussed in this section. begins. During execution this burst of data is transferred to
the GPU, processed and transferred back to the host.
A. Development Process
A systematic approach was followed by moving gradu- VI. R ESULTS
ally from initial implementations in high-level computing The results show the throughput that was achieved, which is
environments to native implementations on the target GPU derived from the measured latency of processing a single burst
architecture. and the data size of the burst. For the GPU implementations,
Implementations were performed and verified in MATLAB both the kernel and total latency were measured and used
[7], which was used as a rapid prototyping platform. Native to derive the kernel and total throughput statistics. Total
C implementations for the CPU were then performed as a latency is the time required to transfer the input data from
stepping stone to the final GPU target architecture. The initial the host memory to the GPU memory, perform the CA-CFAR
native C implementations were single-core CPU implementa- processing on GPU and transfer the output data back from the
tions. Multi-core optimizations were then made for the native GPU memory to the host memory. Kernel latency excludes
C implementations on the CPU architecture. The techniques the data transfer to and from the GPU memory in order to
were then implemented and optimized on the target NVIDIA isolate processing and I/O performance results to some extent.
GPU architecture. Speedup graphs are not shown since they only show relative
A comparable level of optimization was performed for performance which is of limited use for the radar application,
the native C and GPU implementations on the respective where actual throughput and latency are of primary concern.
architectures for each technique.
A. GPU Implementations
B. Environment A CFAR window size of N = 40 cells was used as a
A GPGPU system with an NVIDIA Tesla [8] C1060 GPU typical value for the CFAR window size for the initial GPU
was used as the development, test and benchmarking platform implementation benchmarks. A CFAR window size of around
for all the GPU implementations. The CPU implementations N = 32 cells is suggested as being a typical size by [12] and
were benchmarked on a system with dual Intel Xeon X5355 [13]. The value of N = 40 was used instead to allow for a
quad-core CPUs which yields a total of 8 processor cores. symmetrical CFAR window around the CUT, which is required
OpenMP [9] was utilized to optimize the native C code in for our implementations. The results for all three techniques
order to use multiple cores of the processor. Restructuring of on the GPU architecture are shown in figure 4.
the code was necessary in some cases in order to obtain max- 1) Effect of input size: The sliding window technique per-
imum benefit of using OpenMP. Opportunities for exploiting forms the worst over the entire range of parameters, which is
parallelism in the algorithms were exposed using this approach expected since this technique reduces parallelism in exchange
and it also served as a stepping stone towards an efficient for reduced computational complexity. The SAT technique per-
parallel implementation for the GPU. forms better than the sliding window technique, but is still out-
NVIDIA Compute Unified Device Architecture (CUDA) performed by the naı̈ve technique in general for typical CFAR
[10] was utilized for all the GPU implementations. The CUDA window sizes. The SAT technique has increased complexity
Data Parallel Primitives (CUDPP) [11] library was used to and more overhead since it first needs to generate the SAT be-
generate the SAT for the GPU implementation that uses the fore it can subsequently benefit from the constant-time lookup.
SAT technique. 2) Effect of CFAR window size: The basic naı̈ve imple-
mentation, which uses global memory (GMEM), degrades
C. Verification markedly as the CFAR window size increases. The degradation
The output produced by the MATLAB implementation for be can attributed to the growth of the CFAR window in two
a particular test data set or test burst was used as a baseline dimensions as N increases and the fact that both dimensions
to verify other implementations against. are summed simultaneously in a single phase. The overlap
NVIDIA provide a CUDA Visual Profiler [10] (com- of CFAR windows for neighboring cells increases in both
puteprof) as part of the standard CUDA Toolkit release. dimensions, which introduces significant read contention.
2011 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)

GPU Naïve (GMEM) GPU Slid. Win. (GMEM)

GPU Naïve (SMEM) GPU SAT (GMEM)
N=40, Doppler Bins=32 Range Bins=4096, Doppler Bins=32

1800

1600

1400
Throughput (MiB/s)

1200

1000

800

600

400

200

0
2 3 4 1 2
10 10 10 10 10
Range Bins N (CFAR Window Size)

Figure 4. Kernel Performance of GPU Implementations using a 2D CFAR window.

This effect is still present in the shared memory (SMEM) implementations are designed for a data matrix which has
implementation, even though it segments the processing into range and Doppler dimensions only. Datasets were therefore
a rows and columns. However, the effect only manifests generated for the HPEC CFAR Benchmark where the number
itself in a single dimension at a time. For larger N the of beams are set to 1. The HPEC CFAR Benchmark CPU code
performance trend for this implementation is better than the was also compiled and run on the target CPU platform.
naı̈ve implementation which uses global memory. Again, note
C. Data Throughput
that this version is limited to a maximum CFAR window size
of 33 cells in both range and Doppler dimensions. For GPU implementations the throughput generally de-
The result for the Summed-Area Table implementation creased as the input burst size decreases, which can be
shows that it is essentially unaffected by the CFAR window attributed to some extent to the fixed overheads associated with
size. The SAT requires a constant time to generate the SAT transferring data between the host to the device. The subopti-
table according to the input size of the range-Doppler map. mal occupancy, in terms of parallel units on the GPU that are
The SAT technique can therefore be particularly useful where utilized, for input burst sizes below a certain threshold also
window sizes become very large. contributes to the trend that is observed for the throughput.
VII. C ONCLUSION
B. Performance Comparison with C Implementations
The results showed that optimized GPU implementations
The performance of our GPU and CPU implementations are that favour reduced algorithmic complexity and increased
compared with the HPEC CFAR Benchmark [2] in figure 5. parallelism over reduced computational complexity performed
The HPEC CFAR Benchmark only uses a 1D CFAR window, the best. The GPU implementation for the naı̈ve technique,
which acts only in the range dimension. Our implementations which uses shared memory performed better, in terms of
are optimized for a 2D CFAR window, but can be run in a kernel throughput and latency, than comparable CPU reference
1D CFAR window mode by setting the Doppler dimension implementations for larger input burst sizes (greater than 8K
of the CFAR window to 1. The results were obtained using cells) and typical CFAR window sizes. Even though the SAT
this configuration. The standard datasets for the HPEC CFAR technique is outperformed by the optimized naı̈ve techniques
Benchmark also uses a data cube, which has range, Doppler for typical CFAR windows sizes, there is a crossover point
and beam dimensions. Multiple beams can be obtained by where the SAT technique will perform better with increasing
using multiple receiver elements in a radar system. Our CFAR window size.
2011 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)

GPU Naïve (SMEM) CPU Naïve (8 Cores) CPU HPEC Bench.

GPU Slid. Win. (GMEM) CPU Slid. Win. (8 Cores)
GPU SAT (GMEM) CPU SAT (8 Cores)

Total Throughput Kernel Throughput

700 1800

1600
600
1400
500
1200
Throughput (MiB/s)

400 1000

300 800

600
200
400
100
200

0 0
2 3 4 2 3 4
10 10 10 10 10 10
Range Bins Range Bins

Figure 5. Performance Comparison for N=20, Doppler Bins=32 using a 1D CFAR window.

In terms of the total time required to transfer the input R EFERENCES

data to the GPU, perform the processing and transfer the [1] M. Richards, Fundamentals of Radar Signal Processing. New York,
output data back to the host, the results show that it is NY: McGraw-Hill, 2005.
generally not worthwhile to use the GPU architecture for the [2] HPEC Challenge. (2010) Constant false-alarm rate detection. [Online].
Available: [Link]
CA-CFAR algorithm instead of a CPU based approach. The [3] A. Kerr, D. Campbell, and M. Richards, “GPU performance assessment
trend does however indicate that throughput increases as input with the HPEC challenge,” in Proc. 12th Annual HPEC Workshop,
size increases, the implication being that the use of the GPU Lexington, MA, Sept. 2008.
[4] HPEC Challenge. (2010) HPEC software reference implementation.
architecture may still be viable for larger input sizes than [Online]. Available: [Link]
investigated here. This conclusion is based on the assumption [5] F. C. Crow, “Summed-area tables for texture mapping,” in SIGGRAPH
that a single burst of data is transferred and processed at a time ’84 Proc. 11th Annual Conf. on Computer Graphics and Interactive
Techniques, New York, NY, USA, July 1984, pp. 207–212.
in order to keep overall latency through the system reasonable. [6] P. Viola and M. Jones, “Robust real-time object detection,” in Proc.
In this investigation the CA-CFAR processing was performed 2nd Intl. Workshop on Statistical and Computational Theories of Vision,
in isolation and not as part of a typical radar signal processing Vancouver, Canada, July 2001.
[7] The MathWorks, Inc. MATLAB R
. [Online]. Available: [Link]
pipeline. In such a pipeline one could potentially amortize the [Link] TM
cost of transferring the data to and from the GPU. [8] NVIDIA Corporation. (2010) Tesla . [Online]. Available: http:
//[Link]/object/tesla computing [Link]
[9] OpenMP. (2010) The OpenMP API specification for parallel
Further optimization could also be performed on the SAT programming. [Online]. Available: [Link]
technique by using shared memory, amongst other, given the [10] NVIDIA Corporation. (2010) CUDA zone. [Online]. Available:
improvement seen in case of the naı̈ve techniques. Another [Link] home [Link]
[11] CUDPP. (2010) CUDA data parallel primitives library. [Online].
aspect of the SAT technique that needs to be investigated Available: [Link]
further is the potential for loss of precision that can occur as a [12] M. Skolnik, Introduction to Radar Systems, 2nd ed. Singapore:
result of the increasing monotonic function associated with the McGraw-Hill, 1981.
[13] ——, Radar Handbook, 3rd ed. New York, NY: McGraw-Hill, 2008.
generation of a SAT. There are techniques for compensating
for this effect, such as subtracting an estimated or calculated
mean from all cells prior to generating the SAT.

View publication stats

Good Radar Target Detection CNN 0001581
No ratings yet
Good Radar Target Detection CNN 0001581
5 pages
Fulltext01
No ratings yet
Fulltext01
44 pages
Design and Implementation of A Configurable Real-Time FPGA-Based Geometric Symmetry-CFAR Processer For Radar Target Detection
No ratings yet
Design and Implementation of A Configurable Real-Time FPGA-Based Geometric Symmetry-CFAR Processer For Radar Target Detection
14 pages
For Hamza
No ratings yet
For Hamza
6 pages
An Improved CFAR Algorithm For Multiple Environmental Conditions
No ratings yet
An Improved CFAR Algorithm For Multiple Environmental Conditions
11 pages
Embedded Radar Signalprocessing
No ratings yet
Embedded Radar Signalprocessing
3 pages
Radar Signal Processing Techniques
No ratings yet
Radar Signal Processing Techniques
11 pages
Radar Detection in Interference
100% (1)
Radar Detection in Interference
6 pages
FM-based Multi-Frequency Passive Radar System: Chenqi Zhang, Yong Wu Jun Wang, Zhen Luo
No ratings yet
FM-based Multi-Frequency Passive Radar System: Chenqi Zhang, Yong Wu Jun Wang, Zhen Luo
4 pages
Application of Parallel Processors To Real-Time Sensor Array Processing
No ratings yet
Application of Parallel Processors To Real-Time Sensor Array Processing
7 pages
FMCW Radar Signal Processing for Cars
No ratings yet
FMCW Radar Signal Processing for Cars
64 pages
Timing Signal Generator or High Frequency Radar: KJSCE, Vidyavihar Sameer, Iit-KJSCE, Vidyavihar
No ratings yet
Timing Signal Generator or High Frequency Radar: KJSCE, Vidyavihar Sameer, Iit-KJSCE, Vidyavihar
4 pages
The Application of Parallel DSP Architectures To Radar Signal Processing
No ratings yet
The Application of Parallel DSP Architectures To Radar Signal Processing
7 pages
FPGA Real-Time Range-Doppler Processing
No ratings yet
FPGA Real-Time Range-Doppler Processing
5 pages
Drones 08 00597 v2
No ratings yet
Drones 08 00597 v2
21 pages
Object Detection With Automative Radar Sensors Using CFAR Algorithms
No ratings yet
Object Detection With Automative Radar Sensors Using CFAR Algorithms
26 pages
Low Power DSP for SAR Processing
No ratings yet
Low Power DSP for SAR Processing
6 pages
FFT in Sar
No ratings yet
FFT in Sar
5 pages
DSP in Modern Radar Systems
No ratings yet
DSP in Modern Radar Systems
6 pages
GPU Performance Comparison For Accelerated Radar Data Processing
No ratings yet
GPU Performance Comparison For Accelerated Radar Data Processing
9 pages
DSP in Radar
No ratings yet
DSP in Radar
11 pages
FPGA Signal Processor for Radar Systems
No ratings yet
FPGA Signal Processor for Radar Systems
74 pages
VHF Radar Detection in Clutter
No ratings yet
VHF Radar Detection in Clutter
8 pages
Hybrid Electric Vehicles
No ratings yet
Hybrid Electric Vehicles
28 pages
Thresholding Multiple Target: Radar CFAR in Clutter and Situations
No ratings yet
Thresholding Multiple Target: Radar CFAR in Clutter and Situations
14 pages
ANN DBSCAN Article
No ratings yet
ANN DBSCAN Article
24 pages
Architecture and Algorithm Design For Civil Aviation Data Real-Time Analysis System
No ratings yet
Architecture and Algorithm Design For Civil Aviation Data Real-Time Analysis System
16 pages
Slow Moving Targ Er
No ratings yet
Slow Moving Targ Er
5 pages
CFAR Radar Thresholding Techniques
No ratings yet
CFAR Radar Thresholding Techniques
14 pages
WP 01197 Radar Fpga or Gpu
No ratings yet
WP 01197 Radar Fpga or Gpu
11 pages
Airborne Radar Signal Processor Realisation: Reena Mamgain and Rashi Jain
No ratings yet
Airborne Radar Signal Processor Realisation: Reena Mamgain and Rashi Jain
3 pages
Sensors 20 03904 v2
No ratings yet
Sensors 20 03904 v2
18 pages
Comparative of Signal Processing Techniques For Micro-Doppler Signature Extraction With Automotive Radar Systems
No ratings yet
Comparative of Signal Processing Techniques For Micro-Doppler Signature Extraction With Automotive Radar Systems
14 pages
Proposal
No ratings yet
Proposal
10 pages
The SARENKA SAR System - Experimental Results of ISAR Imaging
No ratings yet
The SARENKA SAR System - Experimental Results of ISAR Imaging
4 pages
Get PDF
No ratings yet
Get PDF
4 pages
A Millimeter Wave Radar Target Detection Method in K-Distributed Clutter Background
No ratings yet
A Millimeter Wave Radar Target Detection Method in K-Distributed Clutter Background
5 pages
Reconfigurable Beamforming For Automotive Radar Sensing and Communication A Deep Reinforcement Learning Approach
No ratings yet
Reconfigurable Beamforming For Automotive Radar Sensing and Communication A Deep Reinforcement Learning Approach
15 pages
Radar Image Processing Techniques
No ratings yet
Radar Image Processing Techniques
5 pages
MATLAB Projects for Engineers
No ratings yet
MATLAB Projects for Engineers
3 pages
DSP Applications in Radar Systems
No ratings yet
DSP Applications in Radar Systems
4 pages
Radar Lectures
No ratings yet
Radar Lectures
312 pages
DSP Applications in Radar Systems
No ratings yet
DSP Applications in Radar Systems
4 pages
Modified Variability-Index CFAR Detection Robust To Heterogeneous Environment
No ratings yet
Modified Variability-Index CFAR Detection Robust To Heterogeneous Environment
6 pages
3D Radar Signal Processing with MATLAB
No ratings yet
3D Radar Signal Processing with MATLAB
23 pages
Colone 2009
No ratings yet
Colone 2009
25 pages
FPGA Signal Processing For Sonar Applications
No ratings yet
FPGA Signal Processing For Sonar Applications
4 pages
(2010) A General Radar Surface Target Echo Simulator
No ratings yet
(2010) A General Radar Surface Target Echo Simulator
4 pages
ANN-Based CFAR Radar Detection
No ratings yet
ANN-Based CFAR Radar Detection
69 pages
Digital Radar-Signal Processors Implemented in Fpgas: Processing Can Be Performed Onboard at Relatively Low Power
No ratings yet
Digital Radar-Signal Processors Implemented in Fpgas: Processing Can Be Performed Onboard at Relatively Low Power
2 pages
1 - Design of GPU-Based Aerial Monitoring Signal Processing Software
No ratings yet
1 - Design of GPU-Based Aerial Monitoring Signal Processing Software
12 pages
Xie RFSOCRadar 24
No ratings yet
Xie RFSOCRadar 24
4 pages
Envelope Detection GO CFAR Pace1994
No ratings yet
Envelope Detection GO CFAR Pace1994
17 pages
En Set 243 06a
No ratings yet
En Set 243 06a
19 pages
ESi-7569 Cfar Formulas
No ratings yet
ESi-7569 Cfar Formulas
18 pages
Real-Time Imaging Processor For Highly Squinted Airborne SAR Based On Multicore DSPs
No ratings yet
Real-Time Imaging Processor For Highly Squinted Airborne SAR Based On Multicore DSPs
2 pages
Doppler-Based Moving Target Detection
No ratings yet
Doppler-Based Moving Target Detection
2 pages
Jin 2016
No ratings yet
Jin 2016
5 pages
Sensors 18 02831
No ratings yet
Sensors 18 02831
11 pages
Model Acceptance Test Guideline Nov 2021
No ratings yet
Model Acceptance Test Guideline Nov 2021
71 pages
St. Paul's English High School: Cambridge Section Mid-Term Examination - 2019
No ratings yet
St. Paul's English High School: Cambridge Section Mid-Term Examination - 2019
20 pages
Electronic Component Inventory
No ratings yet
Electronic Component Inventory
1 page
Tugas RO Integer Programming Formulation
100% (2)
Tugas RO Integer Programming Formulation
3 pages
Overview of Segment Registers
No ratings yet
Overview of Segment Registers
80 pages
Sd-Wan RFP Rfi Template
No ratings yet
Sd-Wan RFP Rfi Template
13 pages
Transition To Microsoft Teams: Secure Real-Time Communications
No ratings yet
Transition To Microsoft Teams: Secure Real-Time Communications
8 pages
CSE 421 Lab: Application Layer Protocols
100% (1)
CSE 421 Lab: Application Layer Protocols
5 pages
Dumpsys ANR WindowManager
No ratings yet
Dumpsys ANR WindowManager
3,785 pages
LAB Manual 9
No ratings yet
LAB Manual 9
6 pages
Lab1 Guideline
No ratings yet
Lab1 Guideline
11 pages
Data Structures for CS Students
No ratings yet
Data Structures for CS Students
3 pages
Update-10GHz Personal Beacon
No ratings yet
Update-10GHz Personal Beacon
4 pages
EVGA GeForce 9800 GTX+ User Manual
No ratings yet
EVGA GeForce 9800 GTX+ User Manual
9 pages
Programming Methodology
No ratings yet
Programming Methodology
5 pages
Traffic Light Control of Public Square Using Arduino 4 Way
No ratings yet
Traffic Light Control of Public Square Using Arduino 4 Way
8 pages
Python Lab
No ratings yet
Python Lab
33 pages
Biometric Attendance System Design
No ratings yet
Biometric Attendance System Design
10 pages
Effect of Feedback
No ratings yet
Effect of Feedback
5 pages
Exploratory Testing Michael Bolton Mon Tutorial
100% (1)
Exploratory Testing Michael Bolton Mon Tutorial
79 pages
IRS218 vs IR218 HVIC Comparison
No ratings yet
IRS218 vs IR218 HVIC Comparison
6 pages
Fiber Optic Attenuator Guide
No ratings yet
Fiber Optic Attenuator Guide
7 pages
rz220 PDF
No ratings yet
rz220 PDF
2 pages
Laser Machine User Manual
No ratings yet
Laser Machine User Manual
26 pages
Project With 4 Components
No ratings yet
Project With 4 Components
4 pages
Viewse In003 - en e (Manual Installation Guide)
No ratings yet
Viewse In003 - en e (Manual Installation Guide)
83 pages
Client Log
No ratings yet
Client Log
58 pages
Diploma Notes Hub Project
No ratings yet
Diploma Notes Hub Project
11 pages
2024-01-05
No ratings yet
2024-01-05
12 pages
Comparing Analog and Digital Signals
No ratings yet
Comparing Analog and Digital Signals
5 pages

Implementation of The CA-CFAR Algorithm For Pulsed

Uploaded by

Implementation of The CA-CFAR Algorithm For Pulsed

Uploaded by

See discussions, stats, and author profiles for this publication at: [Link]

Implementation of the CA-CFAR algorithm for pulsed-Doppler radar on a

Article · January 2012

Chris Venter Kaltham Almalki

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

©2011 IEEE. Personal use of this material is permitted. Permission from

Implementation of the CA-CFAR Algorithm for

978-1-4577-1084-1/11/$26.00 ©2011 IEEE

calculation. The threshold, Tb is then calculated as a function

2 mizations that were made are based on known techniques

GPU Naïve (GMEM) GPU Slid. Win. (GMEM)

Figure 4. Kernel Performance of GPU Implementations using a 2D CFAR window.

GPU Naïve (SMEM) CPU Naïve (8 Cores) CPU HPEC Bench.

Total Throughput Kernel Throughput

In terms of the total time required to transfer the input R EFERENCES

View publication stats

You might also like