Vlsi Project
Vlsi Project
Abstract
1. Introduction
1
was much more increased thereafter primarily due to its potential for efficient and low-cost
implementation. Some of the popular applications are: (1) direct digital frequency syn-
thesis [6], digital modulation and coding for speech/music synthesis and communication;
(2) direct and inverse kinematics computation for robot manipulation; and (3) planar and
three-dimensional (3-D) vector rotation for graphics and animation [7].
The development of the CORDIC algorithm and architectures [8] has taken place for
achieving the highest throughput rate and reduction of hardware-complexity as well as
the computational latency of implementation. Some of the typical approaches for reducing-
complexity implementation are targeted on minimization of using the scaling-operation and
complexity of barrel-shifters and adders in the CORDIC engine. The inherent drawback of
the conventional CORDIC algorithm is computational latency. Angle recording schemes,
mixed-grain rotation and higher radix CORDIC have been developed to reduce the latency.
Parallel and pipelined techniques have been suggested for high-throughput computation.
In this paper, we focus on two subjects, i.e. (1) reduction of computational latency and
improving the computational accuracy of the CORDIC algorithm based-on micro-rotation
duplication and triplication methods, and (2) design and architecture of a high precision
CORDIC architecture.
The rest of the paper is organized as follows. The related works are discussed in Section 2.
Section 3 explains the contribution of this paper. The rotation-extension CORDIC method,
i.e. the double-rotation and triple-rotation methods, are proposed in Section 4. The unified
CORDIC algorithm and a high precision CORDIC algorithm are introduced and discussed
in Section 5. Section 6 considers the algorithm and time investigation of the high precision
CORDIC core. Finally, concluding remarks are presented in Section 7.
2. Related Works
There are various proposed methods for the high performance CORDIC algorithm, re-
ducing the computational latency and increasing the computational accuracy, i.e. (1) high
radix method, (2) parallel rotation method, (3) redundant number representation, and (4)
rotation extension method.
In 1996, E. Antelo et al. [9] proposed the mixed radix-2 and radix-4 method for a re-
dundant CORDIC processor implemented in pipelining architecture. The combination of
the two radix can reduce latency and area of the pipelining stats. The full radix-4 and
very-high CORDIC method with on-line scaling factor computation for high performance
rotation architectures [10] [11] were introduced. Meanwhile, C.C. Li [12] suggested the
fast on-line scaling factor compensation for redundant CORDIC algorithm in radix-4 and
simplified the on-line decomposition algorithm in hardware. [Link] [13] introduced an
extension version of the radix-4 CORDIC method in vectoring mode, where the selection
method is applied for non-redundant and redundant computation. The constant scaling
factor for radix 2-4-8 CORDIC algorithm was investigated by T. Aoki et al. [14]. The algo-
rithm dynamically changed from radix-2 to radix-4 and from radix-4 to radix-8 depending
on the specified sequence of CORDIC algorithm. The constant scaling factor of each radix
QN q 2
was expressed as Kc = i=1 1 + δi · r−2·i , where Kc is a constant scaling factor, N is
the maximum number of iterations, δi is rotation direction of micro-rotation of CORDIC,
ri is the radix-based determined by ith iteration. The articles dealing with the high radix
CORDIC method shows that the scaling factor used for computation and compensation be-
comes a main problem. They attempt to solve this problem by proposing the on-line scaling
factor computation method and by finding a range of the computation in order to constant
the scaling factor on radix-2-4-8 CORDIC algorithm. However, the CORDIC method used
for the constant scaling factor is too complex and hard for hardware implementation.
The parallel CORDIC method was proposed by T.B. Juang et al. [15]. The algorithm
predicts the rotation direction δi from binary value of initial angle. The original sequential
CORDIC rotations is divided into two phase, where the rotation in each phase is executed
in parallel. S.F. Hsiao [16] applied the method to implement a Sine/Cosine generator with
memory-efficient concept; meanwhile [Link] et al. [17] implemented a digital down con-
verter based on the parallel CORDIC algorithm. Thereby, the method can provide a con-
stant scaling factor because the two parallel processing cores are based on the conventional
CORDIC. Although, the scaling factor has been solved by using the parallel technique, the
overhead of summation and prediction of the two parallel CORDIC engines and rotation
direction are aggravated.
The third efficient way to accelerate the CORDIC algorithm is to use redundant number
representation, such as singed-digit (SD) representation and carry-save representation. In
1990, M.D. Ercegovac [18] presented redundant (carry-free) addition instead of a conven-
tional (carry-propagate) one and on-line scaling factor computation method in order to
minimize area and to increase bandwidth. But, the on-line computation method is too
complex which is very difficult in practice. To ignore the on-line scaling factor computation
method, N. Takagi et al. [19] introduced the redundant CORDIC method with a constant
scaling factor. Thereby, the scaling factor is investigated by minimizing the error from the
CORDIC computation.
From the reviewed articles, the reduction of the CORDIC computational latency by the
high radix method makes the scaling factor in instability situation. The unstable scaling
factor problem can be solved by the on-line computation method which is too complex
for hardware practice. The parallel technique is used to accelerate the computation. The
method is based on the two conventional CORDIC cores processing in parallel. Thereby,
the scaling factor problem is solved, but the prediction overhead of the rotation direction
and the combination of the two conventional CORDIC results are added. The extension-
rotation method based on radix-2 accelerates the micro-rotation angle φ with n · φ, where
n is an integer value. The double-rotation method performs the computational results with
micro-rotation angle 2φ. Its scaling factor is first calculated by on-line computation method
and then by optimizing the errors of the computational result in order to constant a scaling
factor.
This paper is focused on the reducing computational latency and the improving com-
putational accuracy of CORDIC with the constant scaling factor based on radix-2. The
radix-2 is used because the scaling factor can be found mathematically. The design and
architecture of the CORDIC algorithm on the radix-2 is straightforward for hardware imple-
mentation. We propose both the rotation-extension CORDIC methods, i.e. non-redundant
double-rotation and non-redundant triple-rotation. They are analyzed, investigated, and
simulated to evaluate the constant scaling factors and the input domain capability for el-
ementary functions. The initial parameter values and the compensation parameter values
for each elementary function, which can be performed in circular, hyperbolic, and linear
coordinate systems are also proposed. We compare the computational accuracy of the el-
ementary functions with Matlab/Simulink in order to show that the proposed CORDIC
methods provide high computational accuracy. Finally, a high accuracy CORDIC algo-
rithm and its architecture is introduced for VLSI hardware implementation, and also the
time and area efficiency based on FPGA and Silicon technology are compared with the
existing CORDICs.
3. Contribution
The reducing computational latency and the increasing computational accuracy of CORDIC
become a challenge for engineers and scientists. Recently, they extremely attempt to im-
prove the computational performance and accuracy by proposing several methods such as
the parallel and prediction CORDIC (PCORDIC) [16] [15], the double-rotation CORDIC
(DRCORDIC), etc. The double-rotation method with redundant method was proposed by
Takagi et al. [19] in rotation mode only; thereby, the convergence of CORDIC parameters
is accelerated by duplicate rotation angle θ to be 2θ. They used few MSB for prediction of
rotation angle and attempted to constant the scaling factor. Similarly, we apply Takagi’s
idea for our double-rotation CORDIC method, but the difference is that the non-redundant
method is utilized in order to constant the scaling factor, and the operation on vectoring
mode is extended. In addition, we propose the triple-rotation CORDIC method where the
micro-rotation angle is triplicated from the conventional CORDIC angle θ to be 3θ.
This paper provides contributions as follow:
• Improvement of performance and accuracy of CORDIC computation: the
convergence of CORDIC parameters is accelerated by increasing the micro-rotation
angle to be 2θ and 3θ for the double-rotation and triple-rotation CORDIC meth-
ods. The non-redundant method is applied in order to constant the scaling factor
mathematically.
• Analysis of the double-rotation and triple-rotation CORDIC methods: the
unified micro-rotation equations for the double-rotation and triple-rotation CORDIC
methods based on radix-2 are proposed, where they are targeted for hardware sim-
plification. The convergence ranges and the initial parameters which will be used to
perform elementary functions on rotation and vectoring modes in the circular mode
such as sine and cosine functions, in hyperbolic mode such as sine and cosine hy-
perbolic functions, and linear mode such as multiplication and division functions are
examined and comprehensively summarized for VLSI implementation.
• Design and time investigation of high accuracy CORDIC arithmetic units:
an algorithm and a computational latency model of a high precision CORDIC arith-
metic unit is introduced for simple VLSI implementation. The model can be applied
to estimate the computational delay and accuracy of the high precision CORDIC core
conforming to the expected absolute error.
The algorithm is based on the rotation of a vector on the xy plane. As shown in Figure 1,
a vector xin , yin having angle A is rotated by angle B, resulting in a new vector xout , yout .
This rotation is described by the expressions shown in Equ. 1 [20]
y (xout,yout)
R
(xin,yin)
B
A
x
Figure 1: Vector rotation
where R is the modulus of the vector and A is the initial angle. The rotation can be
described in matrix form as
Instead of B as presented in Equ. (2), we use notation φ. The basic idea dealing with
the rotation-extension CORDIC computation method is described in Equ. (3), where xin ,
yin , and φ are input parameters and r is a rotation degree. For the sake of simplicity, we
derive and consider all equations based on the circular coordinate system.
The algorithm performs a sequence of rotations by elementary angles. For the conven-
tional CORDIC method, the rotation φ on the xy plan can be decomposed into a product
of n elementary rotations when r = 1 as:
where cos φ is a constant scaling factor. The elementary rotation matrix R(δi ) describes
the elementary rotation with angle θi = tan−1 (2−i ) while i is iteration step. δi ∈ {1, −1}
determines the rotation direction.
n−1 n−1
−δi 2−i
" #
Y Y 1 1
R(δi ) = √ (5)
i=0 i=0 1 + 2−2i δi 2−i 1
Since the rotation angle θi is always a constant value (based on a non-redundant method [21]),
the constant scaling factor cos φ will be n−1 √ 1
Q
i=0 1+2−2i
, which approximately equals to
0.60725. The maximum and minimum rotation angle are defined as ± n−1 −1 −i
P
i=0 tan (2 ),
which is in range of [-1.74329,+1.74329]. The micro-rotation equation of the conventional
CORDIC method is presented in Equ. (6).
xi+1 = xi − δi · 2−i · yi
yi+1 = yi + δi · 2−i · xi (6)
zi+1 = zi − δi · θi
The aim of the double-rotation method is to accelerate the rotation computation of the
CORDIC algorithm by duplicating the elementary angle to be 2θi . Thus, the degree of
parameter r in Equ. (3) is equal to 2.
The decomposition of the production of n elementary rotation is shown in Equ. (8) with
θi = tan−1 (2−i−1 ).
n n
#2
−δi · 2−i−1
"
Y Y 1 1
R(δi ) = (8)
i=1 i=1
(1 + 2−2i−2 ) δi · 2−i−1 1
To stabilize the scaling factor, the non-redundant number, where δi ∈ {−1, 1}, is applied
for the double-rotation CORDIC method. Thereby, the optimal constant scaling factor
is formulated by ni=1 (1+2−2i−2
1
Q
which approximately equals to 0.9219. The maximum
P)n
and minimum rotation angle i=1 2 · tan−1 (2−i−1 ) is in range of [-0.9885,+0.9885]. The
micro-rotation equation of the double-rotation CORDIC method is shown in Equ. (9).
Like the conventional and double-rotation methods, the triple-rotation method can ac-
celerate a micro rotation θi by triplicating the elementary angle θi to be 3θi .
The micro-rotation angle is θi = tani (2−i−2 ). The non-redundancy for rotation direction
is also applied. Thereby, the scaling factor is approximately constant at 0.9922, which
is calculated from n+1 1
Q
i=2 3 . The maximum and minimum rotation angle can be
(1+2−2i−4 ) 2
found by n+1 −1 −i−2 ), which is in range of [-0.3747, +0.3747]. The micro-rotation
i=2 3 · tan (2
P
equation of the triple-rotation CORDIC method being simplified for VLSI hardware imple-
mentation is shown in Equ. (12).
zin − n−1 −1 −i
i=0 δi · tan (2 )
P
zout−CV =
Pn
zout−DR = zin − i=1 δi · 2 · tan−1 (2−i−1 ) (13)
Pn+1
zout−T R = zin − i=2 δi · 3 · tan−1 (2−i−2 )
The condition that the triple-rotation and double-rotation CORDIC methods have more
precision than the conventional one can be expressed as:
Pn+1 Pn
zin − i=2 δi · 3 · tan−1 (2−i−2 ) > zin − −1 −i−1 )
i=1 δi · 2 · tan (2
Pn−1 (14)
> zin − i=0 δi · tan−1 (2−i )
From Equ. 14 the rotation direction δ on each iteration depends on an intermediate pa-
rameter zi of each method. Analysis based on mathematic assumption could not guarantee
the computational accuracy due to nonlinear equation. Therefore, the accuracy evaluation
by using simulation based on statistical analysis in mathematic which is Mean Absolute
Percent Error and standard statistic can be applied in practice.
δ
CORDIC method
-1 0 1
Conventional 0.5020 0 0.4980
Proposed Double-rotation 0.5051 0.1434 0.4734
Proposed Triple-rotation 0.5051 0 0.4949
The Mean Absolute Percent Error (MAPE) mathematical factor is employed for accuracy
evaluation in available range [-1.74329, +1.74329], [-0.9885,+0.9885], and [-0.3747, 0.3747]
of the conventional, double-rotation, and triple-rotation CORDIC methods. The simulation
results show that the proposed CORDIC methods provide the smallest MAPE at the same
number of iterations, which implies better accuracy as illustrated in Table 2. The MAPE
is defined by the formula:
n
1X Ai − Fi
M = , (15)
n i=1 Ā
where Ai is the actual value and Fi is the forecast value. The difference between Ai and
Fi is divided by the average actual value Ā again. The absolute value of this calculation is
summed for every forecast point in time and divided again by the number of fitted points
n.
Table 2: The MAPE comparison of xI and yI of the conventional, double-rotation and
triple-rotation CORDIC methods, where the iteration step I equals to 8, 10, and 16.
Four standard statistical indicators are employed for accuracy evaluation of the proposed
CORDIC computational methods, i.e. maximum absolute error (M ax. |error|), minimum
absolute error (M in. | error|), average absolute error (Ave. |error|), and standard deviation
absolute error (Ave. Dev. |error|). All formulas are expressed as follow:
Maximum absolute error (M ax. |error|)
M ax. |error| = M ax. {|A0 − F0 | , · · · , |AN − FN |} (16)
Minimum absolute error (M in. |error|)
M in. |error| = M in. {|A0 − F0 | , · · · , |AN − FN |} (17)
Average absolute error (Ave. |error|)
N
1 X
Ave. |error| = |Ai − Fi | (18)
N i=0
Standard deviation absolute error (Ave. Dev. |error|)
Xi = |Ai − Fi |
N
1 X
X̄ = |Ai − Fi |
N i=0
v
u N
u1 X
Ave. Dev. |error| = t (Xi − X̄)2 (19)
N i=0
The analysis results with the four statistical indicators are shown in Table 3 and 4. Both
tables show the comparison of the computational error of the conventional, double-rotation
and triple-rotation CORDIC methods based on Matlab simulation in the various number of
iterations(N) at 8, 10, 16, 32, and 64. For measuring environment, since we want to measure
the computational accuracy on the three parameters, i.e. xout , yout , and zout , the CORDIC
methods have to be performed on both rotation mode and vectoring mode. Then, the
circular coordinate system is selected for our test case. Table 3 reports the analysis result
of the CORDIC methods in rotating mode performed by function number 1 in Table 5.
Similarly, the analysis result in vectoring mode of the CORDIC method performed by
function number 3 is illustrated in Table 4. From the tables, the double-rotation and
triple-rotation CORDIC methods still provide higher accuracy than the conventional one
when they are analyzed by the statistical measurements.
Statistical Analysis
Iteration CORDIC
(N ) M ax. |error| M in. |error| Ave. |error| Std. Dev. |error|
Conventional 9.4966E-3 3.2120E-5 4.5125E-3 2.6722E-3
8 Double-rotation 4.8971E-3 1.5988E-5 2.3296E-3 1.3431E-3
Triple-rotation 3.6940E-3 2.2748E-5 1.7411E-3 1.0119E-3
Conventional 2.3952E-3 4.7574E-6 1.1589E-3 6.6409E-4
10 Double-rotation 1.2320E-3 6.9294E-6 5.7707E-4 3.3635E-4
Triple-rotation 9.1174E-4 5.6572E-7 4.3358E-4 2.5278E-4
Conventional 3.7365E-5 1.2253E-7 1.8090E-5 1.0543E-5
xout 16 Double-rotation 1.8925E-5 8.1072E-9 9.1003E-6 5.3043E-6
Triple-rotation 1.4181E-5 2.6530E-8 6.7987E-6 3.9405E-6
Conventional 5.8888E-10 4.6108E-13 2.7506E-10 1.6137E-10
32 Double-rotation 2.9457E-10 2.9358E-12 1.3889E-10 8.0351E-11
Triple-rotation 2.1399E-10 1.0505E-12 1.0593E-10 5.9295E-11
Conventional 9.9920E-16 3.7453E-15 5.8693E-15 3.6346E-16
64 Double-rotation 3.4417E-15 1.5543E-15 2.4920E-15 2.1890E-16
Triple-rotation 1.5543E-15 1.1102E-16 7.1996e-16 2.8387E-16
Conventional 6.748E-3 1.7481E-5 2.9117E-3 1.7723E-3
8 Double-rotation 3.3532E-3 8.9682E-6 1.4952E-3 8.7957E-4
Triple-rotation 2.5376E-3 1.5211E-5 1.1163E-3 6.5960E-4
Conventional 1.6862E-3 3.3777E-6 7.4364E-4 4.3493E-4
10 Double-rotation 8.2529E-4 3.9484E-6 3.7026E-4 2.1735E-4
Triple-rotation 6.1626E-4 4.1980E-7 2.7813E-4 1.6486E-4
Conventional 2.6937E-5 9.8153E-8 1.1614E-5 6.8945E-6
yout 16 Double-rotation 1.3301E-5 1.6685E-8 5.8318E-6 3.4532E-6
Triple-rotation 9.6708E-6 4.4125E-9 4.3576E-6 2.5698E-6
Conventional 4.0018E-10 1.9198E-12 1.7623E-10 1.0478E-10
32 Double-rotation 2.0294E-10 5.1292E-13 8.9197E-11 5.2357E-11
Triple-rotation 1.5010E-10 2.9110E-13 6.8012E-11 3.8826E-11
Conventional 5.7732E-15 4.4580E-15 4.0651E-15 5.3280E-16
64 Double-rotation 2.4425E-15 2.6645E-15 9.8142E-16 5.0489E-16
Triple-rotation 1.7764E-15 1.0012E-15 4.2967E-16 3.1539E-16
The performance and time efficiency of the double-rotation and triple-rotation CORDIC
methods can be investigated and evaluated by two methods based on the conventional
CORDIC and Matlab simulation. First, determination of absolute error constraint method:
thereby, the absolute error is given as a different expected error ∆E of a CORDIC computa-
tional result and a Matlab simulation result. If the actual different error ∆e of a CORDIC
Table 4: The analysis of computational accuracy of CORDIC methods in vectoring mode
on the circular coordinate system.
Statistical Analysis
Iteration CORDIC
(N ) M ax. |error| M in. |error| Ave. |error| Std. Dev. |error|
Conventional 2.1886E-5 5.8874E-8 9.577E-6 9.2267E-6
8 Double-rotation 6.0890E-6 3.2259E-6 3.5329E-6 2.0376E-6
Triple-rotation 2.6975E-6 1.3162E-7 1.3317E-6 9.2651E-7
Conventional 1.296E-6 8.4752E-9 5.9492E-7 5.9646E-7
10 Double-rotation 3.0917E-7 6.8938E-8 1.9876E-7 1.0249E-7
Triple-rotation 2.5691E-7 2.5071E-9 9.0108E-8 1.0678E-7
Conventional 3.3458E-10 4.3225E-12 1.2284E-10 1.4963E-10
xout 16 Double-rotation 6.9108E-11 2.7643E-12 4.7638E-11 1.6298E-11
Triple-rotation 4.5727E-11 1.3323E-14 1.9549E-11 2.3043E-11
Conventional 5.5511E-15 2.2204E-15 3.3307E-16 1.3597E-16
32 Double-rotation 4.1078E-15 3.2196E-16 3.7970E-15 3.6316E-16
Triple-rotation 1.4433E-15 3.1307E-16 9.5479E-16 4.4823E-16
Conventional 5.5511E-15 2.2204E-15 3.3307E-16 1.3597E-16
64 Double-rotation 4.1078E-15 3.2196E-16 3.7970E-15 3.6316E-16
Triple-rotation 1.4433E-15 3.1307E-16 9.5479E-16 4.4823E-16
Conventional 6.6160E-3 3.4315E-4 3.7572E-3 2.5093E-3
8 Double-rotation 3.1040E-3 2.9878E-4 1.8719E-3 1.1285E-3
Triple-rotation 2.3227E-3 8.0323E-4 1.5465E-3 5.8293E-4
Conventional 1.6100E-3 1.1742E-4 9.3258E-4 6.3258E-4
10 Double-rotation 6.7778E-4 1.0275E-4 4.2626E-4 2.6665E-4
Triple-rotation 7.1681E-4 7.0811E-5 3.4929E-4 2.6975E-4
Conventional 2.5868E-5 2.9403E-6 1.2778E-5 1.0149E-5
zout 16 Double-rotation 9.9709E-6 4.0604E-6 7.2306E-6 2.2905E-6
Triple-rotation 9.5632E-6 1.6408E-7 4.7247E-6 4.5793E-6
Conventional 3.2805E-10 1.0026E-10 2.3369E-10 9.6639E-11
32 Double-rotation 1.7855E-10 7.2290E-11 9.8846E-11 4.5243E-11
Triple-rotation 1.5460E-10 2.4420E-11 8.5503E-11 5.5706E-11
Conventional 1.1102E-16 2.7756E-17 6.6613E-17 3.1646E-17
64 Double-rotation 1.1102E-16 2.7756E-17 6.6613E-17 3.1646E-17
Triple-rotation 1.1102E-16 2.7756E-17 4.1633E-17 4.3885E-17
computational result and a Matlab simulation result is equal or less than ∆E, then the
number of iterations will be kept in record.
Figure 2 illustrates the relationship of the given absolute error with the number of itera-
tions of the conventional, double-rotation, and triple-rotation CORDIC methods based on
Matlab/Simulink. As the figure, at the same number of iterations, the triple-rotation and
double-rotation CORDIC methods provide higher accuracy than the conventional one. In
turn, at the same absolute error, the triple-rotation and double-rotation CORDIC methods
require the smaller number of iterations.
Second, observation of the converging parameters x, y, z of the micro-rotation: a Sine-
Cosine function is performed by setting the initial parameters x, y, and z in rotating mode,
where the parameter z is driven to be zero z −→ 0 by the CORDIC algorithm. Based
on our experiment with several values of input z, the triple-rotation CORDIC method
converges faster than the double-rotation and conventional ones for all three parameters.
The converging of the three parameters is captured and shown in Figure 3 as an example,
where z is initialized with φ = −0.1 radian, y is set to 0, and x is defaulted by the constant
scaling factor of each CORDIC method. The proposed CORDIC method provides better
performance with the fewer number of iterations compared to the conventional and the
double-ration method.
25 10
Conventional Conventional
Double−rotation Double−rotation
Triple−rotation 9 Triple−rotation
20
Iteration steps
Iteration steps
8
15
7
10
6
5 5
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
|Error constraint| x 10
−3 |Error constraint| x 10
−3
(a) 5 to 25 (b) 5 to 10
12
Conventional 24 Conventional
Double−rotation Double−rotation
Triple−rotation Triple−rotation
11.5 22
Iteration steps
Iteration steps
20
11
18
16
10.5
14
10 12
0 1 2 3 4 0 0.2 0.4 0.6 0.8 1 1.2
|Error constraint| x 10
−4 |Error constraint| x 10
−4
(c) 10 to 13 (d) 13 to 25
Figure 2: The expected number of iterations with the absolute error varied from 1.0E-8 to
1.0E-3.
1 0.7 0.7
Conventional Conventional
0.95 0.6 Double−rotation 0.6 Double−rotation
Conventional Tripple Triple−rotation
0.9 Double−rotation 0.5
0.5
Tripple
0.85 0.4
0.4
0.8 0.3
xi
yi
zi
0.3
0.75 0.2
0.2
0.7 0.1
0.65 0.1 0
0 −0.1
2 4 6 8 10 2 4 6 8 10 2 4 6 8 10
Interation (n) Interation (n) Iterations (n)
(a) The convergence of parameter x (b) The convergence of parameter y (c) The convergence of parameter z
Figure 3: The convergence of CORDIC parameters where x is initialized with the constant
scaling factors of each CORDIC method, y = 0, and z = φ = −0.1 radian.
5. Unified CORDIC Algorithm
6.1. Algorithm
In the pre-processing step, the computational mode (hs) is determined in either normal-
accuracy mode or high-accuracy mode, where the start (start) and end (end) indexes will
be set up. Afterwards, the input parameters xstart , ystart , and zstart will be initialized
corresponding to (start) and f unc. In the CORDIC processing step, the micro-rotation
of either double-rotation or triple-rotation methods will be executed iteratively, where the
execution in either rotation mode or vectoring mode and the coordinate systems depend
on the rotating parameter rmode and the coordinate parameter m. Finally, the post-
processing step will compensate the computed results with inversion of the constant scaling
factor corresponding to f unc in Table 5.
The block diagram of the high precision CORDIC core according to Algorithm 5 is shown
in Figure 4a. Due to the small convergence range of the double-rotation and triple-rotation
methods, the convergence extension module which could be realized by convergence ex-
tension methods is included. Generally, there are two types of the convergence extension
methods introduced to solve the convergence range problem, i.e. mathematic identity
method [19], [22] and expansion of the set of iterative method [23]. The mathematic iden-
tity method applies the mathematic properties such as trigonometric identities in order
to compress inputs xi , yi , and zi and to decompress outputs xout , yout , and zout gener-
ated by the high precision CORDIC core. The expansion of the set of iterative method
expands the linear CORDIC convergence range by choosing the set of iterative indexes to
i = −M, −M + 1, · · · , N , where M and N are two integers applied to determine the con-
vergence. Then, the convergence ranges are |zi | ≤ 2M +1 and xyii ≤ 2M +1 for driving z or
xin,yin,zin, configuration 14
Normal−accuracy
xi ,yi ,zi High−accuracy
Convergence extension
Pre-processing 13
CORDIC processing
Clock cycle
12
Runtime Configurable CORDIC core
Double-rotation or Triple-rotation
11
10
Post-processing
xo,yo, zo 9
0 0.2 0.4 0.6 0.8 1
xout,yout,zout |Error constraint| −3
x 10
(a) (b)
Figure 4: The block diagram of the high precision CORDIC core with the convergence
extension module and its computational latency.
y toward zero. For the sake of simplicity the detail of the convergence extension methods
will be ignored in this paper and its computational latency will be specified as a constant
value.
From the block diagram in Figure 4a, the computational time investigation of the high
precision CORDIC core with the convergence extension module can be examined as follow-
ing: 1) suppose that Text , Tpre , Tdr , Ttr , and Tpost are internal delay of the convergence ex-
tension, the pre-processing, the double-rotation, the triple-rotation and the post-processing,
respectively. 2) Tdr and Ttr depending on the number of CORDIC iterations (Niter−dr ,
Niter−tr ) corresponding to the expected accuracy as shown in Figure 2 are expressed as:
Suppose that Text , Tpre , Tpost , Tmicro−dr , and Tmicro−tr equal to 1 clock cycle, then the
delay of the high precision CORDIC core based on the double-rotation and triple-rotation
CORDIC methods when the expected absolute error 3.0E-4 are at 11 clock cycle and
13 clock cycle. The graph in Figure 4b shows the relationship of the expected absolute
computational error and the delay of the high precision CORDIC Core in normal-accuracy
mode and high-accuracy mode with the absolute errors range from 0 to 1.0E-3.
This section compares the proposed CORDIC methods with the existing ones. The
micro-rotation in pipelined (unfolded) digit-parallel architecture has been brought up for
Xi ωi Zi
Xi Yi Zi
Shifter Shifter
Shifter Shifter 2-2i 1 21 2tan-1(2-i-1)
2-i 2-i sign selection 2tan-1(2-i-1)
redundance Shifter Shifter sign selection
Shifter Shifter 2-2i-2 2-2i-1 redundance
2-2i-2 2-2i-2
αi βi
αi βi CSA CSA
CSA CSA
αi αi
SDA SDA SDA SDA SDA SDA
Xo Yo Zo Xo ωo Zo
(a) Double-rotation [19] (b) 2D-Householder [24]
Figure 5: The existing constant scaling factor CORDIC based on redundant method.
Xi Yi Zi
Shifter Shifter
2-i 2-i sign selection 2tan-1(2-i-1)
non- redundance
Shifter Shifter
2-2i-2 2-2i-2
CSA CSA δi
Xo Yo Zo
(a) Proposed double-rotation
Xi Yi Zi
Zo
SDA SDA
Xo Yo
Figure 6: The proposed constant scaling factor CORDIC based on non-redundant method.
consideration in speed and area performance, where pre-processing and post-processing are
ignored. Basic components normally used to implement the CORDIC consist of 3-to-2
Carry-Save-Adder (CSA), Sign-Digit-Adder (SDA), redundant sign selection (SIGN-SEL),
non-redundant sign selection (SIGN-SEL-NON), and right shifter (SHR). To obtain com-
parison results close to reality as much as possible, the basic components are implemented
and analyzed in various data width at 16-bit, 32-bit, and 64-bit on 90-nm Faraday silicon
technology. The synthesis results are individually normalized based on delay and consumed
area of the targeted technology; afterward they are normalized again in the various data
width as presented in Table 6.
The two existing constant scaling factor CORDIC methods, i.e. the redundant double-
rotation [19] CORDIC and the 2D-Householder [24] CORDIC, have been put forward for
comparison with the proposed CORDIC methods, whose architectures on rotation mode
in the circular coordinate system are illustrated in Figure 5 and 6. Table 7 compares the
speed and area performance of these CORDIC methods. Based on pipeline architecture,
the computational operators corresponding to each CORDIC algorithm are performed as
the delay models. Also the number of utilized basic computational operators, conforming
Table 6: Basic components synthesis results on 90-nm Faraday silicon technology.
Table 7: The time and area performance of the CORDIC methods in pipeline (unfolded)
digit-parallel architecture.
to Figure 5 and 6, is modeled as the area models. In [19], the redundant double-rotation
method with a constant scaling factor is applied to only the CORDIC rotation mode.
The proposed double-rotation method extends to vectoring mode. The redundant rotation
direction (αi , βi ∈ {−1, 0, 1} ) are employed in [19], but we apply the non-redundant rotation
direction in our double-rotation CORDIC. From the delay and area models in Table 7, the
delay and consumed area on 16-bit data width of the double-rotation CORDIC method
of [19] can be evaluated as following: Delay : 0.4348 + 0.0932 + 0.0807 + 1 = 1.6087, Area
: 4 × 0.8533 + 2 × 0.5708 + 3 + 0.0301 = 7.5849.
In [24], the redundant 2D-Householder CORDIC method is applied on both rotation
mode and vectoring mode. By the way, its scaling factor is performed by the on-line
computation which increases the complexity for VLSI implementation. However, for the
sake of simplification, the on-line computation is neglect for this comparison. The delay
and area on 16-bit data width of the 2D-Householder CORDIC method can be estimated
as following: Delay : 0.4348 + 0.0932 + 0.0807 + 1 = 1.6087, Area : 4 × 0.8533 + 2 × 0.5708 +
3 + 0.0301 = 7.5849. By the same method the delay and area on 16-bit of the proposed
double-rotation CORDIC method can be evaluated as follow: Delay : 0.4348 + 0.0063 +
0.0807 + 1 = 1.5218, Area : 4 × 0.8533 + 2 × 0.5708 + 3 + 0.0053 = 7.5601, and Delay :
0.4348 + 0.0063 + 0.0807 + 2 = 2.5218, Area : 10 × 0.8533 + 4 × 0.5708 + 7 + 0.0053 = 17.822
for the proposed triple-rotation CORDIC method. The time and area efficiency of these
CORDIC methods in different data width can be illustrated in Table 8.
From the comparison, the proposed redundant double-rotation CORDIC method pro-
vides better time efficiency than the non-redundant double-rotation and 2D-Householder
CORDIC methods. However, the proposed double-rotation one is extended to vectoring
mode and also unemploy to use the on-line constant scaling factor computation. Area
efficiency of the three CORDIC methods have similar values because they use the same
number of basic components. The proposed triple-rotation CORDIC methods is also eval-
uated to demonstrate its time and area efficiencies. Although the time and area efficiency
of the triple-rotation CORDIC method are lower than the double-rotation CORDIC meth-
Table 8: Normalized speed and area performance comparison of the proposed CORDIC
methods and the existing CORDIC methods in different data width.
ods, but the method provides higher computational accuracy. Its time efficiency can be
improved by increasing a pipeline stage or by enhancing the performance of Adder.
7. Conclusion
Design and analysis of the extension-rotation CORDIC methods, the double-rotation and
the triple-rotation, to improve the performance and accuracy of the CORDIC computation
have been presented and discussed. The paper can be summarized as follow:
1. The methods use non-redundant values to stabilize the constant scaling factor and
to avoid the on-line scaling factor problem. The computational accuracy of the two
CORDIC methods is measured by statistical measurements, i.e. MAPE, M ax. |error|,
M in. |error|, Ave. |error|, Std. Dev. |error|, and compared to the conventional CORDIC
method and the Matlab Simulation results. The analysis results show the double-
rotation and triple-rotation CORDIC methods provide higher accuracy than the con-
ventional one with the same number of iterations. On the other hand, with the same
computational accuracy, the double-rotation and triple-rotation can be achieved with
smaller number of iteration.
2. The unified CORDIC algorithms of the double-rotation and triple-rotation CORDIC
methods applied to perform elementary function in rotation mode and vectoring mode
on the circular, hyperbolic, linear coordinate systems are come out. Afterward, they
are utilized for algorithm of the high precision CORDIC core.
3. The high accuracy CORDIC core is introduced and investigated in order to show
the performance and time efficiency in normal-accuracy and high-accuracy modes
in various expected absolute error. Based on the pipeline (unfolded) digit-parallel
architecture, the speed and area performance of the proposed CORDIC methods are
compared with the existing CORDIC methods, where the proposed double-rotation
CORDIC method provide better time efficiency with similar area efficiency to the
existing constant scaling factor CORDIC methods.
References
[1] J. Volder, “The CORDIC trigonometric computing technique,” IRE Trans. Electron. Compute., vol.
EC-8, pp. 330–334, 1959.
[2] ——, “The birth of CORDIC,” The Journal of VLSI Signal Processing, vol. 25(2), pp. 101–105, 2000.
[3] J. Walther, “A unified algorithm for elementary functions,” Spring Joint Computer Conf., pp. 379–385,
1971.
[4] P. Meher, J. Valls, J. Tso-Bing, K. Sridharan, and K. Maharatna, “50 Years of CORDIC: Algorithms,
Architectures, and Applications,” IEEE Transactions, Circuits and Systems, vol. 56, pp. 1893–1907,
2009.
[5] D. Cochrab, “Algorithms and accuracy in the HP-35,” HewlettPackard Journal, pp. 1–11, June 1972.
[6] S. [Link] and J. M. Delosme, “Parallel singular value decomposition of complex matrices using multi-
dimensional CORDIC algorithms,” IEEE Trans. Signal Processing, vol. 44, no. 3, pp. 685–697, 1996.
[7] ——, “Householder CORDIC algorithm,” IEEE Trans. Computers, vol. 44, no. 8, pp. 990–1001, 1995.
[8] B. Lakshmi and A. S. Dhar, “CORDIC Architectures: A Survey,” Hindawi Publishing Corporation
VLSI Design, vol. 2010, pp. 1–19, 2010.
[9] E. Antelo, J. Bruguera, and E. Zapata, “Unified mixed radix 2-4 redundant CORDIC processor,” IEEE
Transactions on Computers, vol. 45, pp. 1068–1073, 1996.
[10] E. Antelo, J. Villalaba, D. Bruguera, and E. Zapata, “High performance rotaion architecture based on
the radix CORDIC algorithm,” IEEE Trans. Comput., vol. 46, no. 46, pp. 855–870, August 1997.
[11] E. Antelo, T. Lang, and J. Bruguera, “Very-high radix cicular CORDIC: Vecotring and unified rota-
tion/vectoring,” IEEE Trans. Comput., vol. 49, no. 7, pp. 727–739, July 2000.
[12] C.-C. Li and S.-G. Chen, “A radix-4 redundant CORDIC algorithm with fast on-line variable scale
factor compensation,” in IEEE International Conference on Acoustics, Speech, and Signal Processing,
1997, pp. 639–642.
[13] J. Villalba, E. Zapata, E. Antelo, and J. Bruguera, “Radix-4 Vectoring CORDIC Algorithm and
Architectures,” The Journal of VLSI Signal Processing, vol. 19, no. 2, pp. 127–147, 1998.
[14] T. Aoki, I. Kitaori, and T. Higuchi, “Radix-2-4-8 CORDIC for Fast Vector Rotation,” IEICE Trans-
action Fundamentals, vol. E83-A, no. 6, pp. 1106–1114, 2000.
[15] J. Tso-Bing, H. Shen-Fu, and T. Ming-Yu, “Para-CORDIC: parallel CORDIC rotation algorithm,”
IEEE Transactions on Circuits and Systems I, Regular Papers, vol. 51, pp. 1515–1524, 2004.
[16] H. Shen-Fu, H. Yu-Hen, and J. Tso-Bing, “A memory-efficient and high-speed sine/cosine generator
based on parallel CORDIC rotations,” IEEE Signal Processing Letters, vol. 11, pp. 152–155, 2004.
[17] W. Han, Z. Yousi, and L. Xiaokang, “A Parallel Double-Step CORDIC Algorithm for Digital Down
Converter,” in Communication Networks and Services Research Conference, 2009., 2009.
[18] M. D. Ercegovac and T. Lang, “Redundant and on-line CORDIC: Application to matrix triangulariza-
tion and SVD,” IEEE Trans. Comput., vol. 39, no. 6, pp. 725–740, June 1990.
[19] N. Takagi, T. Asada, and S. Yajima, “Redundant CORDIC methods with a constant scale factor for
sine and cosine computation,” IEEE Transactions on Computers, vol. 40, pp. 989–995, September 1991.
[20] M. D. Ercegovac and T. Lang, Digital Arithmetic. Morgan Kaufmann, 2003.
[21] J. Muller, Elementary Functions: Algorithms and Implementation. Cambridge, MA: Birkhauser, 1997.
[22] P. Surapong and M. Glesner, ““Pipelined Floating-Point Architecture for a Phase and Magnitude De-
tector based on CORDIC”,” in International Conference on Field Programmable Logic and Application,
2011, pp. 382–384.
[23] X. Hu, R. Harber, and S. Bass, “Expanding the range of convergence of the CORDIC algorithm,”
IEEE Transactions on Computers, vol. 40, pp. 13–21, 1991.
[24] S.-F. Hsiao and J.-Y. Chen, “Design, Implementation and Analysis of a New Redundant CORDIC
Processor with Constant Scaling Factor and Regular Structure,” J. VLSI Signal Process. Syst., vol. 20,
pp. 267–278, December 1998.
Authors
Pongyupinpanich Surapong was born in Prachinburi, Thailand. He
received his Bachelor and Master of Engineering degree in Electrical
Engineering from King Mongkut’s Institute of Technology Ladkrabang
(KMITL), Thailand in 1998 and 2002. Currently, he is working toward
the PhD degree in Microelectronic Systems Research Group, Technische
Universität Darmstadt, Darmstadt, Germany. His research interests in-
clude computer-aided VLSI design, design optimization algorithm, cir-
cuit simulation, digital signal processing, system-on-chip, all in the con-
text of field-programmable gate-array devices and VLSI technology.
Faizal Arya Samman was born in Makassar, Indonesia. In 1999,
he received his Bachelor of Engineering degree from Universitas Gad-
jah Mada, in Yogyakarta, Indonesia. In 2002, he received his Master
of Engineering degree from Institut Teknologi Bandung, in Indonesia
with Scholarship Award from Indonesian Ministry of National Educa-
tion. In 2002, he was appointed to be a research and teaching staff
at Universitas Hasanuddin, in Makassar, Indonesia. He received his
PhD degree in 2010 at Technische Universität Darmstadt, in Germany
with scholarship award from Deutscher Akademischer Austausch-Dienst
(DAAD, German Academic Exchange Service). He is now working as a postdoctoral fellow
in LOEWE-Zentrum AdRIA (Adaptronik-Research, Innovation, Application) within the re-
search cooperation framework between Technische Universität Darmstadt and Fraunhofer
Institut LBF in Darmstadt. His research interests include network-on-chip (NoC) microar-
chitecture, NoC-based multiprocessor system-on-chip, design and implementation of analog
and digital electronic circuits for control system applications on FPGA/ASIC as well as
energy harvesting systems and wireless sensor networks.
Manfred Glesner received the diploma degree and the Ph.D. degree
from Universität des Saarlandes, Saarbrücken, Germany, in 1969 and
1975, respectively. His doctoral research was based on the application of
nonlinear optimization techniques in computer-aided design of electronic
circuits. He received three Doctor Honoris Causa degrees from Tallinn
Technical University, Tallinn, Estonia, in 1996, Poly-technical University
of Bucharest, Bucharest, Romania, in 1997, and Mongolian Technical
University, Ulan Bator, Mongolia, in 2006. Between 1969 and 1971,
he has researched work in radar signal development for the Fraunhofer
Institute in Werthoven/Bonn, Germany. From 1975 to 1981, he was a Lecturer in the areas
of electronics and CAD with Saarland University. In 1981, he was appointed as an Associate
Professor in electrical engineering with the Darmstadt University of Technology, Darmstadt,
Germany, where, in 1989, he was appointed as a Full Professor for microelectronic system
design. His current research interests include advanced design and CAD for micro- and
nanoelectronic circuits, reconfigurable computing systems and architectures, organic circuit
design, RFID design, mixed-signal circuit design, and process variations robust circuit
design. With the EU-based TEMPUS initiative, he built up several microelectronic design
centers in Eastern Europe. Between 1990 and 2006, he acted as a speaker of two DFG-
funded graduate schools. Dr. Glesner is a member of several technical societies and he is
active in organizing international conferences. Since 2003, he has been the vice-president of
the German Information Technology Society (ITS) in VDE and also a member of the DFG
decision board for electronic semiconductors, components, and integrated systems. He was
a recipient of the honor/decoration of “Palmes Academiques” in the order of Chevalier by
the French Minister of National Education (Paris) for distinguished work in the field of
education in 2007/2008.