ICUWB07
ICUWB07
net/publication/4290498
CITATIONS READS
6 160
6 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Torben Brack on 20 May 2014.
C. Suboptimal Decoding
The simplest suboptimal check node algorithm is the well-
known Min-Sum algorithm [14], where the incident message
with the smallest magnitude determines the output of all other
Fig. 2. Tanner graph for an irregular LDPC code messages:
of the VNs f[dmax ,...,3,2] gives the fraction of VNs with a c −1
d
v
certain degree, with dmaxv the maximum variable node degree. λk = sign(λl ) · min (|λl |). (3)
l=0,l=k
The degree distribution of the CNs can be expressed as l=0,l=k
max
g[dmax
c ,dmax
c −1] with dc the maximum CN degree, meaning The resulting performance comes close to the optimal Sum-
that only CNs with two different degrees occur [13]. Product algorithm only for high rate LDPC codes (R ≥ 3/4)
To obtain a good communications performance of an LDPC with relatively large CN degree dc . It can be further optimized
code the degree distribution should be optimized with respect by multiplying each outgoing message with a message scaling
to the codeword size N . The degree distribution can be factor (MSF) of 0.75. For lower code rates the communications
optimized by density evolution as shown in [13]. Furthermore, performance strongly degrades.
the resulting Tanner graph should have cycles as long as
possible to ensure that the iterative decoding algorithm works D. Layered Decoding
properly. A cycle in the Tanner graph is defined as the shortest Layered decoding applies a different message schedule than
path from a VN back to its origin without traveling an edge the classical two-phase decoding. It was originally proposed
twice. Especially cycles of length four have to be avoided [8]. by Mansour [15] and denoted as turbo decoding message
IV. UWB LDPC C ODES passing (TDMP), then it was referred to as layered decoding
A. UWB LDPC Code Design by Hocevar [16]. The basic idea is to process a subset of CNs
and to pass the newly calculated messages immediately to the
Fulfilling the latency constraints of the UWB system (less
corresponding VNs. The VNs update their outgoing messages
than 2 µs) in conjunction with a small resulting decoder silicon
in the same iteration. The next CN subset will thus receive
area was one of the major prerequisites. To guarantee these
newly updated messages which improves the convergence
we had to design an ultra sparse LDPC code [2][6]. The
speed and therefore increases communications performance
chosen degree distribution is f[3,2] = { 3/4, 1/4} for the variable
for a given number of iterations [17]. In Section VI we present
nodes and g[11] = {1} for the check nodes. It is important to
a partly-parallel architecture for layered decoding, processing
note that an LDPC code with a denser degree distribution,
each of these subsets in parallel.
i.e. f[6,3,2] = { 1/4, 1/2, 1/4} and g[14] = {1} can gain up to
0.5 dB in terms of communications performance compared to V. I TERATION C ONTROL
our chosen one. However, the resulting decoder would result
Iterative decoding algorithms show an inherent dynamic
in a twice as large silicon area compared to our approach.
behavior, i.e. the number of iterations strongly depends on
Furthermore, the layered constraint could not be preserved
the signal-to-noise ratio which changes over time. Instead of
which would involve a high throughput penalty.
using a fixed number of iterations, the number of iterations can
B. Decoding Algorithm be controlled by an intelligent iteration control mechanism. In
LDPC codes can be decoded using the message passing al- [18] it was shown that iteration control is the most efficient
gorithm [7]. It exchanges soft-information iteratively between technique for energy saving in a turbo decoder system without
variable and check nodes. Updating the nodes can be done sacrificing communications performance. The arguments are
with a canonical, two-phased scheduling: In the first phase also valid for LDPC decoding.
all variable nodes are updated, in the second phase all check An efficient iteration control has to distinguish between
nodes respectively. The processing of individual nodes within decodable and undecodable blocks. LDPC codes already pro-
one phase is independent and can thus be parallelized. The vide an implicit stopping criterion for decodable blocks by
exchanged messages are assumed to be log-likelihood ratios simply checking for an all-zero syndrome if a codeword has
(LLR). Each variable node of degree dv calculates an update been successfully decoded, see Equation 1. Normally only this
of message k according to: stopping criterion is used in state-of-the-art implementations.
v −1
d More advanced stopping criteria tackle especially the unde-
λk = λch + λl , (2) codable codewords. As soon as a codeword is most certainly
l=0,l=k too corrupted to be successfully decoded, the decoder stops
its decoding process. This approach puts the LDPC decoder The input channel values as well as the extrinsic information
in idle mode (i.e. clock gating is activated on gate level) in in the message RAM are represented by 6 bit values, an
low SNR regions instead of wasting the maximum number of a posteriori message in the channel RAM is represented
iterations all the time and thus wasting energy. We apply the by 9 bit. This quantization parameters could be changed to
stopping criterion presented in [19] to evaluate the achievable further reduce the resulting VLSI area at the possible cost of
energy savings for the WIMEDIA UWB environment. This communications performance.
stopping criterion is based on a monitoring of the variable
nodes reliability (V N R). VII. R ESULTS
dm In this section simulation results of iteration control in the
v −1
N −1
Λ m
= λm + λm V NR = |Λ |
m WIMEDIA UWB simulation chain are given. Furthermore
ch l , (4)
l=0 m=0
implementation results on the rapid prototyping platform and
energy measurement results are presented.
The V N R is the sum of the absolutes of all intermediate
VN results. The calculation of this value has a very low A. Simulation Results
implementation complexity. The decoding process is stopped if We selected the 16-QAM case with an air data rate of 960
the V N R does not change or decreases within two successive Mbit/s and the 128 point FFT for our simulations. The input
decoding iterations. The stopping criterion has to be switched quantization is 6 Bit for the LDPC and 4 Bit for the Viterbi
off when the V N R passes a threshold (V N Rof f ) which is decoder, which leads to a communications performance loss
SNR dependent. This threshold has to be determined only less than 0.2 dB in comparison to a floating point implemen-
once for each code rate [20]. With an appropriate V N Rof f tation. This is a good compromise between implementation
the loss in communications performance can be decreased to complexity and communications performance degradation.
zero. For more detailed information on this criterion the reader Figure 3 shows the average number of iterations for various
is referred to [20]. SNRs with the inherent LDPC stopping criterion only and
VI. LDPC D ECODER A RCHITECTURES the combined criteria for decodable and undecodable blocks
This chapter gives a summary of the decoder architecture, presented in the previous section. We can make the following
more detail can be seen in [2]. A partly parallel architecture observations:
template is sufficient to ensure the throughput for the UWB • In the error floor SNR region (above 23 dB) the inherent
LDPC decoder. A parallelization of P is established, which stopping criterion reduces the number of iterations for
means that P edges are processed per clock cycle. To ensure decodable blocks to 3.
code rate and codeword size flexibility the functional nodes • In the waterfall SNR region (15 to 23 dB) the combined
are realized in a serial manner. Thus each functional unit can criteria yields a maximum average number of iterations of
accept one message per clock cycle as already mentioned. only 5.5. That means that the iteration control mechanism
The mapping of the VNs and CNs to the functional units is can save up to 32% of iterations which yields a substantial
determined by the given code structure. Efficient mapping of energy saving.
the nodes to the functional units can be always guaranteed • The detection mechanism for undecodable blocks gives
by designing LDPC codes using permuted identity matrices. the highest gain in the non-convergence SNR region
Always I VNs/CNs with the connection described by an I-by- (below 15 dB). It saves up to 4.5 iterations, i.e. 45%,
I identity matrix are allocated to I distinct VFU and I distinct assuming a maximum number of 8 iterations.
CFU respectively. This was shown by many LDPC decoder The threshold V N Rof f for the iteration control for unde-
realizations ([21][22][23][24]) as well as in the case of DVB- codable blocks is selected in such a way that the commu-
S2 LDPC decoding [25]. The parallelization P is determined nications performance loss is lower than 0.01 dB, which is
by the identity matrix size I [2]. depicted in Figure 4. This figure shows that the coding gain
A permutation network has to realize the permutation of of the LDPC code compared to the convolution code is more
the identity matrix what can be done by a barrel shifter. To than 4 dB at a PER of 10−3 for this configuration setup.
reach reasonable communications performance and minimize
the area consumption, the check nodes are implemented with B. Implementation Results
an optimized version of the Min-Sum algorithm (Section IV- As already addressed in the introduction, we used a rapid
C). prototyping environment for energy estimation. This rapid
It applies layered decoding as described in Section IV-D at prototype environment is based on a Virtex-4 FPGA. To allow
which the VN processing can be done within the check node a fair comparison with our LDPC decoder, we used the Xilinx
block (CNB). This is achieved by storing the input information Viterbi IP core [26]. The Xilinx Viterbi decoder is a highly
of the CFU in a FIFO and adding the new extrinsic information optimized IP block. Both decoders were configured such that
to the corresponding input message in the FIFO. Thus the they show the same throughput.
correct a posteriori information is passed back to the channel The Viterbi and LDPC decoder were both implemented
memory which holds always the updated VN information. This using the Xilinx ISE 9.1 tool chain. High effort and area
basic concept was first presented in [15]. optimization was selected in conjunction with a 100 MHz
8
Number of Iterations with enhanced stopping criterion
LDPC Code f[3,2] = { 3/4, 1/4}
Number of Iterations with inherend stopping criterion only g[11] = {1}
7
Codeword Size 1200 bit
Code Rate 3/4
6
Parallelism 30
Quantization 6 bit
Number of average Iterations
5
Algorithm MinSum+MSF,
Layered Decoding
4
Iterations 8
Comm. Perform. see Figure 4, [2]
3
Xilinx FPGA Virtex-4 XC4VLX60-10 @100 MHz
LUTs 9,497
2
FlipFlops 4,310
Slices 5,996
1
non−convergence
BRAMs 13
waterfall error−floor
Region Region region Throughput 100 Mbit/s
0
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Eb/N0 / dB
TABLE II
Fig. 3. Average number of iteration of the LDPC decoder vs. SNR I MPLEMENTATION RESULTS FOR UWB-LDPC DECODER
0
10
−3
TABLE III
10
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Eb/N0 / dB I MPLEMENTATION RESULTS FOR X ILINX CC V ITERBI DECODER
Fig. 4. Communications performance of LDPC code with and without
advanced stopping criterion vs. CC code decoder consumes the same energy as the CC running with
7.5 iterations. In Figure 6 the mean energy consumption for
clock frequency constraint. Table II shows the results for the decoding per block vs. the singal to noise ratio is figured out.
LDPC decoder and the design space parameters. Table III In case of using only the inherent stopping criterion the LDPC
shows the corresponding parameters for the CC decoder. Input code only in the non-convergence region needs more energy
and output RAMs which have the same size for both decoders for decoding in mean. Combining the energy/iteration with
are included in these numbers. the iteration numbers of Figure 3, the LDPC decoder with
Both decoders achieved an equal net throughput of iteration control consumes in all cases less energy in mean than
100 Mbit/s, or 1 bit/clock cycle. Since the LDPC decoder is the Viterbi decoder for decoding one codeword. Moreover the
highly scalable, its throughput can be easily increased [2]. convolutional decoder requires a channel interleaver to reach
a reasonable communications performance. This interleaver
C. Energy Measurements Results
consumes additional energy which has to be added to the
We measured the core power of the decoders including the Viterbi energy consumption of the CC itself.
input and output RAMs for each decoder respectively. Before
starting the measurements the input RAM was loaded with
representative data extracted from the UWB simulation chain. Summarizing the energy consumption of the LDPC decoder
After loading we started the decoding process and measured is smaller than the energy consumption of the Viterbi decoder
the power and the decoding time to calculate the energy for the same throughput assuming realistic data sets. Since we
consumption, i.e. reading the input RAMs and writing the measured the energy on an FPGA rapid prototyping board,
results into the output RAM are included in the corresponding the absolute values can not be directly transferred to an
energy budget. ASIC implementation. However it is feasible to conclude that
Figure 5 shows the measured energy consumption of the two the relative energy behavior of both decoders in an ASIC
decoders. Since the LDPC decoding is an iterative process, implementation is the same, i.e. the LDPC decoder consumes
we measured the energy for varying numbers of decoding less energy than a Viterbi decoder in an ASIC implementation,
iterations, ranging from 2 to 8. It can be seen that the LDPC too.
−6
x 10
3 [2] T. Brack, F. Kienle, T. Lehnigk-Emden, M. Alles, N. Wehn, and
F. Berens, “Enhanced Channel Coding for OFDM-based UWB Sys-
2.5
tems,” in Proc. International Conference on Ultra-Wideband (ICUWB
2006), Waltham, Massachusetts, Sept. 2006, pp. 255–260.
[3] S.-M. Kim, J. Tang, and K. K. Parhi, “Quasi-Cyclic Low-Density Parity-
Energy per Block[J]
2
Check Coded Multiband-OFDM UWB Systems,” in IEEE International
Symposium on Circuits and Systems, May 2005, pp. 65–68.
1.5 [4] K.-B. Png, X. Peng, and F. Chin, “Performance Studies of a Multi-
band OFDM System Using a Simplified LDPC Code,” in IEEE Ultra-
1 Wideband Systematic and Technologies 2004, May 2004.
Energy consumption [5] J. Tang, T. Bhatt, and V. Stolpman, “Modified Min-Sum Algorithm for
UWB−LDPC Decoder LDPC Decoders in UWB Communications,” in Proc. IEEE International
0.5 Energy consumption
Viterbi Decoder Conference on Ultra-Wideband 2006, Boston, Massachusetts, USA,
Sept. 2006.
0
2 3 4 5 6 7 8 [6] M. Alles, T. Brack, T. Lehnigk-Emden, F. Kienle, N. Wehn, F. Berens,
Number of Iterations and A. Ruegg, “A Survey on LDPC Codes and Decoders for OFDM-
based UWB Systems,” in Proc. 56th Vehicular Technology Conference
Fig. 5. Energy for UWB-LDPC and Viterbi Decoder, Energy per Block
(VTC Spring ’07), Dublin, Ireland, Apr. 2007, to appear.
−6
x 10
[7] R. G. Gallager, Low-Density Parity-Check Codes. Cambridge, Mas-
sachusetts: M.I.T. Press, 1963.
2.6
[8] T. Richardson and R. Urbanke, “The Renaissance of Gallager’s Low-
Density Pariy-Check Codes,” IEEE Communications Magazine, vol. 41,
2.4
LDPC w advanced stopping criterion pp. 126–131, Aug. 2003.
LDPC w inherend stopping criterion only
2.2 CC code
[9] European Telecommunications Standards Institude (ETSI), “Digital
Video Broadcasting (DVB) Second generation framing structure for
broadband satellite applications; EN 302 307 V1.1.1,” www.dvb.org.
Mean Energy per Block / J
2
[10] IEEE 802.16e, “Air Interface for Fixed and Mobile Broadband Wireless
1.8 Access Systems,” IEEE P802.16e/D12 Draft, oct 2005.
[11] IEEE 802.11n, “Wireless LAN Medium Access Control and Physical
1.6 Layer specifications: Enhancements for Higher Throughput,” IEEE
P802.16n/D1.0, Mar 2006.
1.4 [12] J. K. Omura, “On the Viterbi decoding algorithm,” IEEE Transactions
on Information Theory, vol. 15, pp. 177–179, Jan. 1969.
1.2 [13] T. Richardson, M. Shokrollahi, and R. Urbanke, “Design of Capacity-
Approaching irregular Low-Density Parity-Check Codes,” IEEE Trans-
1
non−convergence waterfall error−floor
action on Information Theory, vol. 47, no. 2, pp. 619–637, Feb. 2001.
Region Region region [14] F. Guilloud, E. Boutillon, and J. Danger, “λ-Min Decoding Algorithm
0.8
10 11 12 13 14 15 16 17 18
Eb/N0 / dB
19 20 21 22 23 24 25 of Regular and Irregular LDPC Codes,” in Proc. 3nd International
Symposium on Turbo Codes & Related Topics, Brest, France, Sept. 2003,
Fig. 6. Energy consumption vs. SNR for UWB-LDPC and Viterbi Decoder pp. 451–454.
[15] M. Mansour and N. Shanbhag, “High-Throughput LDPC Decoders,”
VIII. C ONCLUSIONS IEEE Transactions on Very Large Scale Integration Systems, vol. 11,
no. 6, pp. 976–996, Dec. 2003.
We presented an implementation complexity comparison [16] D. Hocevar, “A reduced complexity decoder architecture via layered de-
with special emphasis on energy consumption between an coding of LDPC codes,” in Proc. IEEE Workshop on Signal Processing
Systems (SiPS ’04), Austin,USA, Oct. 2004, pp. 107–112.
LDPC and a Viterbi decoder for the WIMEDIA UWB system. [17] J. Zhang and M. Fossorier, “Shuffled Iterative Decoding,” IEEE Trans-
A sophisticated iteration control for the LDPC decoding pro- actions on Communications, vol. 53, pp. 209–213, Feb. 2005.
cess can drastically reduce the number of decoding iterations [18] M. J. Thul, T. Vogt, F. Gilbert, and N. Wehn, “Evaluation of Algo-
rithm Optimizations for Low-Power Turbo-Decoder Implementations,”
and consequently the energy. The LDPC decoder with iteration in Proc. 2002 IEEE International Conference on Acoustics, Speech, and
control requires in average less energy per codeword than the Signal Processing (ICASSP ’02), Orlando, Florida, USA, May 2002, pp.
Viterbi decoder and gains 4 dB in sense of communications 3101–3104.
performance at a PER of 10−3 . [19] N. Wehn, F. Kienle, and T. Brack, “Method and device for controlling
the decoding of a LDPC encoded codeword, in particular for DVB-S2
It is a feasible assumption that the energy ratio measured LDPC encoded codewords,” European Patent Application No.05 009
on a prototyping platform can be transferred to an ASIC 477.0, Apr. 2005.
[20] F. Kienle and N. Wehn, “Low Complexity Stopping Criterion for LDPC
implementation. Thus the ultra sparse LDPC codes [2][6] are Code Decoders,” in Proc. 2005-Spring Vehicular Technology Conference
good candidates for channel codes of the future WIMEDIA (VTC ’05 Spring), Stockholm, Schweden, May 2005, pp. 606–609.
UWB standard, not only from an communications point of [21] E. Boutillon, J. Castura, and F. Kschischang, “Decoder-first code de-
sign,” in Proc. 2nd International Symposium on Turbo Codes & Related
view but also from an energy point of view. Topics, Brest, France, Sept. 2000, pp. 459–462.
[22] D. Hocevar, “LDPC Code Construction with Flexible Hardware Imple-
IX. ACKNOWLEDGEMENT mentation,” in Proc. 2003 International Conference on Communications
This work has been partly carried out in the framework of (ICC ’03), May 2003, pp. 2708–2712.
[23] M. Mansour and N. Shanbhag, “Architecture-Aware Low-Density
the IST project PULSERS Phase II (FP6-027142), which is Parity-Check Codes,” in Proc. 2003 IEEE International Symposium on
partly funded by the European Union. Circuits and Systems (ISCAS ’03), Bangkok, Thailand, May 2003.
[24] F. Kienle and N. Wehn, “Design Methodology for IRA Codes,” in Proc.
R EFERENCES 2004 Asia South Pacific Design Automation Conference (ASP-DAC ’04),
Yokohama, Japan, January 2004, pp. 459–462.
[1] Standard ECMA-368, “High Rate Ultra Wideband PHY and MAC [25] F. Kienle, T. Brack, and N. Wehn, “A Synthesizable IP Core for DVB-
Standard.” [Online]. Available: http://www.ecma-international.org S2 LDPC Code Decoding,” in Proc. 2005 Design, Automation and Test
in Europe (DATE ’05), Munich, Germany, Mar. 2005, pp. 1530–1535.
[26] Xilinx Inc., “Viterbi Decoder V6.0, Data Sheet.”