Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2003, IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A high-throughput memory-efficient decoder architecture for low-density parity-check (LDPC) codes is proposed based on a novel turbo decoding algorithm. The architecture benefits from various optimizations performed at three levels of abstraction in system design-namely LDPC code design, decoding algorithm, and decoder architecture. First, the interconnect complexity problem of current decoder implementations is mitigated by designing architecture-aware LDPC codes having embedded structural regularity features that result in a regular and scalable message-transport network with reduced control overhead. Second, the memory overhead problem in current day decoders is reduced by more than 75% by employing a new turbo decoding algorithm for LDPC codes that removes the multiple checkto-bit message update bottleneck of the current algorithm. A new merged-schedule merge-passing algorithm is also proposed that reduces the memory overhead of the current algorithm for low to moderate-throughput decoders. Moreover, a parallel soft-input-soft-output (SISO) message update mechanism is proposed that implements the recursions of the Balh-Cocke-Jelinek-Raviv (BCJR) algorithm in terms of simple "max-quartet" operations that do not require lookup-tables and incur negligible loss in performance compared to the ideal case. Finally, an efficient programmable architecture coupled with a scalable and dynamic transport network for storing and routing messages is proposed, and a full-decoder architecture is presented. Simulations demonstrate that the proposed architecture attains a throughput of 1.92 Gb/s for a frame length of 2304 bits, and achieves savings of 89.13% and 69.83% in power consumption and silicon area over state-of-the-art, with a reduction of 60.5% in interconnect length. Index Terms-Low-density parity-check (LDPC) codes, Ramanujan graphs, soft-input soft-output (SISO) decoder, turbo decoding algorithm, VLSI decoder architectures. I. INTRODUCTION T HE PHENOMENAL success of turbo codes [1] powered by the concept of iterative decoding via message-passing has rekindled the interest in low-density parity-check (LDPC) codes which were first discovered by Gallager in 1961 [2]. Recent breakthroughs to within 0.0045 dB of AWGN-channel capacity were achieved with the introduction of irregular LDPC codes in [3], [4] putting LDPC codes on par with turbo codes. However, efficient hardware implementation techniques of turbo decoders have given turbo codes a clear advantage Manuscript
IEEE Workshop on Signal Processing Systems
In this paper, we propose a turbo decoding messagepassing (TDMP) algorithm to decode regular and irregular lowdensity parity-check (LDPC) codes. The TDMP algorithm has two main advantages over the commonly employed two-phase messagepassing algorithm. First, it exhibits a faster convergence behavior (up to 50% less iterations), and improvement in coding gain (up to an order of magnitude for moderate-to-high SNR and small number of iterations). Second, the corresponding decoder architecture has a significantly reduced memory requirement that amounts to a savings of (75 + 25n/ C node-degrees)% > 75% for code-length n. A decoder architecture featuring the TDMP algorithm is also presented. Furthermore, we propose a new structure on the paritycheck matrix of an LDPC code based on permutation matrices aimed at reducing interconnect complexity and improving decoding throughput. In addition, we construct a wide range of LDPC codes based on Ramanujan graphs which possess this structure.
Global Telecommunications Conference, 2002. GLOBECOM '02. IEEE
Turbo decoding of low-density parity-check (LDPC) and generalized low-density (GLD) codes and the corresponding decoder architectures are considered. A regular (c, r)-LDPC code of length n is viewed as the intersection of c interleaved super-codes where each super-code is the direct sum of n/r independent single parity-check sub-codes. Extensions to GLD codes simply utilize more powerful sub-codes. The turbo decoding schedule is employed to decode LDPC and GLD codes using constituent soft-input soft-output (SISO) decoders that communicate through c interleavers. The proposed schedule exhibits a faster convergence behavior, and hence lower decoding latency, than the commonly employed two-phase schedule, and has a reduced memory requirement that is a function of the number of super-codes. The performance of the turbo decoding schedule is evaluated through simulations over an AWGN channel.
2010 National Conference On Communications (NCC), 2010
Turbo codes and Low Density Parity Check (LDPC) codes have been shown to be practical codes that can approach Shannon capacity in several communication systems. In terms of performance and implementation complexity, LDPC codes and turbo codes are highly comparable, especially at coding rates around 1/2. In many recent wireless standards such as 3GPP LTE and WiMax, both turbo and LDPC codes have been recommended at the encoder. However, the decoder for turbo codes involves trellises and the BCJR algorithm, while the decoder for LDPC codes uses sparse graphs and the message passing algorithm. Therefore, in several implementations, a designer is forced to implement either the turbo decoder or the LDPC decoder. The main idea behind this work is to enable the implementation of both decoders using a common architecture. We view the constituent convolutional code in a turbo code as a block code, and construct a sparse parity check matrix for it. Then, the sparse matrix and the associated bipartite graph are used for decoding the convolutional code by soft message passing algorithms. Simulation results show a manageable degradation in performance with a reduction in complexity.
Journal of Signal Processing Systems, 2011
Low-density parity-check (LDPC) codes and convolutional Turbo codes are two of the most powerful error correcting codes that are widely used in modern communication systems. In a multi-mode baseband receiver, both LDPC and Turbo decoders may be required. However, the different decoding approaches for LDPC and Turbo codes usually lead to different hardware architectures. In this paper we propose a unified message passing algorithm for LDPC and Turbo codes and introduce a flexible soft-input soft-output (SISO) module to handle LDPC/Turbo decoding. We employ the trellis-based maximum a posteriori (MAP) algorithm as a bridge between LDPC and Turbo codes decoding. We view the LDPC code as a concatenation of n super-codes where each super-code has a simpler trellis structure so that the MAP algorithm can be easily applied to it. We propose a flexible functional unit (FFU) for MAP processing of LDPC and Turbo codes with a low hardware overhead (about 15% area and timing overhead). Based on the FFU, we propose an area-efficient flexible SISO decoder architecture to support LDPC/Turbo codes decoding. Multiple such SISO modules can be embedded into a parallel decoder for higher decoding throughput. As a case study, a flexible LDPC/Turbo decoder has been synthesized on a TSMC 90 nm CMOS technology with a core area of 3.2 mm 2 . The decoder can support IEEE 802.16e LDPC codes, IEEE 802.11n LDPC codes, and 3GPP LTE
Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03.
A high-throughput memory-efficient decoder architecture for archit-~E~U~O-B W B I~ low-density parity-check (LDPC) codes is proposed based on a novel turbo-decoding algorithm. The aichltecture benefits from various optimizations at the code-design, decoding algorithm, and decoder architecture levels. The interconnect complexity and memory overhead problems of current decoder implementations are reduced by designing slructured or erehiteeture-aware LDPC codes and employing a new turbo-decoding algorithm. An efficient memory architecture coupled with a scalable and dynamic transport network for storing and routing messages are proposed. Simulations demonstrate that the proposed architecture attains a throughput of 1.92Gbits/s for a frame length of 2304 bits, and achieves saving of 89.13% and 62.80% in power consumption and silicon area over state-of-the-art. with B reduction of 60.5 % in interconnect Wires.
2007 IEEE International Symposium on Circuits and Systems, 2007
A low-density parity-check (LDPC) decoder architecture that supports variable block sizes and multiple code rates is presented. The proposed architecture is based on the structured quasi-cyclic (QC-LDPC) codes whose performance compares favorably with that of randomly constructed LDPC codes for short to moderate block sizes. The main contribution of this work is to address the variable block-size and multirate decoder hardware complexity that stems from the irregular LDPC codes. The overall decoder, which was synthesized, placed and routed on TSMC 0.13-micron CMOS technology with a core area of 4.5 square millimeters, supports variable code lengths from 360 to 4200 bits and multiple code rates between 1/4 and 9/10. The average throughput can achieve 1 Gbps at 2.2 dB SNR.
2015
Turbo codes are the channel coding scheme used in wireless cellular networks as they are able to reach close to the Shannon limit. This paper proposes the use of turbo codes and LDPC codes for storage of data. Turbo encoding can be performed by using parallel Recursive Systematic Convolutional (RSC) encoder and an interleaver while turbo decoding is based on Bahl Cocke Jelinek and Raviv (BCJR) algorithm, the Maximum Aposterior Algorithm (MAP).Low Density Parity-Check (LDPC) codes encoding technique are based on the generator matrix value of the original code word to be identified. In LDPC decoding Hard-decision decoding algorithm is followed. Finally, a comparative analysis on turbo and LDPC codes is presented. Theoretical and experimental results show turbo codes perform better than LDPC codes. Key-Words: BCJR algorithm, Check nodes, encoding algorithm, Hard-decision decoding algorithm, LDPC codes, Turbo codes, Variable nodes.
… Conference, 2005. 23rd, 2005
Low-density parity-check codes have recently received extensive attention as a forward error correction scheme in a wide area of applications. The decoding algorithm is inherently parallelizable, allowing communication at high speeds. One of the main disadvantages, however, is large memory requirements for interim storing of decoding data. In this paper, we propose an architecture for an early decision decoding algorithm. The algorithm significantly reduces the number of memory accesses. Simulation results show that the increased energy dissipation of the components is small compared to the reduced dissipation of the memories.
IEEE Access, 2016
Low-density parity-check (LDPC) block codes are popular forward error correction schemes due to their capacity-approaching characteristics. However, the realization of LDPC decoders that meet both low latency and high throughput is not a trivial challenge. Usually, this has been solved with the ASIC and FPGA technology that enables meeting the decoder design constraints. But the rise of parallel architectures, such as graphics processing units, and the scaling of CPU streaming extensions has shown that multicore and many-core technology can provide a flexible alternative to the development of dedicated LDPC decoders for the compute-intensive prototyping phase of the design of new codes. Under this light, this paper surveys the most relevant publications made in the past decade to programmable LDPC decoders. It looks at the advantages and disadvantages of parallel architectures and data-parallel programming models, and assesses how the design space exploration is pursued regarding key characteristics of the underlying code and decoding algorithm features. This paper concludes with a set of open problems in the field of communication systems on parallel programmable and reconfigurable architectures. INDEX TERMS LDPC codes, LDPC decoders, parallel computing, CPU, GPU, reconfigurable computing, high-level synthesis. Recently, he joined the R&D Department, Coriant GmBH, Lisbon, Portugal, where he is a Hardware Engineer. His research activities focus on architectures for error-correction and their resiliency to unreliable memory systems. He is an Affiliated Member of the HiPEAC network.
2016 Euromicro Conference on Digital System Design (DSD), 2016
In this paper, we propose a layered LDPC decoder architecture targeting flexibility, high-throughput, low cost, and efficient use of the hardware resources. The proposed architecture provides full design time flexibility, i.e., it can accommodate any Quasi-Cyclic (QC) LDPC code, and also allows redefining a number of parameters of the QC-LDPC code at the run time. The main novelty of the paper consists of: (1) a new low-cost processing unit that merges in an efficient way the logical functionalities of the Variable-Node Unit (VNU) and the A Posteriori Log-Likelihood Ratio (AP-LLR) unit, (2) a high speed, low-cost Check-Node Unit (CNU) architecture, which is executed twice in order to complete the computation of the check-node messages at each iteration, (3) a splitting of the iteration processing in two perfectly symmetric stages, executed in two consecutive clock cycles, each one using exactly the same processing resources; the processing load is perfectly balanced between the two clock cycles, thus yielding an optimal clock frequency. Synthesis results targeting a 65nm CMOS technology for a (3, 6)-regular (648, 1296) Quasi-Cyclic LDPC code and for the WiMax (1152, 2304) irregular QC-LDPC code show significant improvements in terms of area and throughput compared to the baseline architecture discussed in this paper, as well as several state of the art implementations.
International journal of engineering and technology, 2018
This article, deals with efficient trellis inbuilt decoding architecture for non-binary Linear Density Parity Check (LDPC) codes. In this decoder, a bidirectional recursion is embedded to enhance the layered scheduling and decoding latency, which in turn is used to minimize the number of iterations compared to existing techniques. Consequently, it is necessary to increase the throughput for improving the efficiency of the system. In addition, a compression technique is implemented for reducing the requirements of memory and the area. Trellis based decoder was used to reinforce the check node processing. The proposed decoder for LDPC codes yields high throughput when compared to other similar decoders presented in preceding works. The designed architecture was implemented using Cadence Virtuoso software. This decoder provides a throughput of about 39.21 Mb/s at clock frequency of 190MHz.
2006
A high throughput pipelined LDPC decoder that supports multiple code rates and codeword sizes is proposed. In or- der to increase memory throughput, irregular block struc- tured parity-check matrices are designed with the constrai nt of equally distributed odd and even nonzero block-columns in each horizontal layer for the pre-determined set of code rates. The designed decoder achieves a data
Performance Comparison of Turbo coder and low-density parity check codes, 2021
This paper investigates the two powerful forward error correction techniques, Turbo codes and LDPC codes. The different code parameters such as code rate, decoding iterations, and block length are considered under AWGN channel. The strengths and performance hindrance facts of both the coding techniques been summarized.
Proceedings of the 2002 international symposium on Low power electronics and design - ISLPED '02, 2002
Iterative decoding of low-density parity check codes (LDPC) using the message-passing algorithm have proved to be extraordinarily effective compared to conventional maximumlikelihood decoding. However, the lack of any structural regularity in these essentially random codes is a major challenge for building a practical low-power LDPC decoder. In this paper, we jointly design the code and the decoder to induce the structural regularity needed for a reduced-complexity parallel decoder architecture. This interconnect-driven code design approach eliminates the need for a complex interconnection network while still retaining the algorithmic performance promised by random codes. Moreover, we propose a new approach for computing reliability metrics based on the BCJR algorithm that reduces the message switching activity in the decoder compared to existing approaches. Simulations show that the proposed approach results in power savings of up to 85.64% over conventional implementations. Categories and Subject Descriptors B.7.1 [Types and Design Styles]: VLSI; E.4 [Coding and Information Theory]: Error control codes However, in order to achieve desired power and throughputs for current applications (e.g., > lMbps in 3G wireless systems, > lGbps in magnetic recording systems), fully parallel and pipelined iterative decoder architectures are needed. Compared to turbo codes, LDPC codes enjoy a significant advantage in terms of computational complexity and are known to have a large amount of inherent parallelism [3]. However, the randomness of LDPC codes results in stringent memory requirements that amount to an order of magnitude increase in complexity compared to those for turbo codes. A direct approach to implementing a parallel decoder architecture would be to allocate, for each node or cluster of nodes in the graph defining the LDPC code, a function unit for computing the reliability messages, and employ an interconnection network to route messages between function nodes (see Fig.1). A major problem with this approach is that the interconnection networks require complex wiring to perform global routing of messages and hence must be deeply pipelined (e.g., bidirectional multilayered networks in [4] and 4096-input multiplexers per function unit in [5]). Moreover, the randomness in the pattern of communicating messages leads to routing and congestion problems on the networks which require extensive buffering to resolve.
In this paper, a reduced-complexity, scalable implementation of LDPC decoder is presented. The decoder architecture in this paper is an improved version of . The new architecture makes the implementation of multiple code rates, multiple block sizes and multiple standards LDPC decoder very straightforward. As an example, we implemented a parameterized decoder that supports the LDPC code in IEEE 802.16e standard, which requires code rates of 1/2, 2/3 and 3/4, with block sizes varying from 576 to 2304. The decoder is synthesized with Texas Instruments' 90 nm ASIC process technology, with a target operation frequency of 100 MHz, 15 decoding iterations, the maximum data rate is up to 256 Mbps.
IEEE Journal of Solid-State Circuits, 2002
A 1024-b, rate-1/2, soft decision low-density paritycheck (LDPC) code decoder has been implemented that matches the coding gain of equivalent turbo codes. The decoder features a parallel architecture that supports a maximum throughput of 1 Gb/s while performing 64 decoder iterations. The parallel architecture enables rapid convergence in the decoding algorithm to be translated into low decoder switching activity resulting in a power dissipation of only 690 mW from a 1.5-V supply.
The Journal of VLSI Signal Processing-Systems for Signal, Image, and Video Technology, 2005
A new parameterized-core-based design methodology targeted forprograinniable decoders for low-density parity-check (LDPC) codes is proposed. The inethodology solves the two major drawbacks of excessive memory overhead and complex on-chip interconnect typical of existing decoder implementations which limit the scalability, degrade the error-correction capability, and restrict the domain of application of LDPC codes. Diverse memory and interconnect optimizations are pcrfotined at the code-design, decoding algorithm, decoder architecture, and physical layout levels, with the following features: 1) Architecture-aware (AA)-LDPC code design with embedded structural features that significantly reduce interconnect complexity, 2) faster and memory-etficient turbo-decoding algorithm for LDPC codes, 3) programmable architecture having distributed memory, parallel message processing units, and dynamiclscalable transport networks for routing messages, and 4) a parameterized macro-cell layout library implernenting the main components of the architecture with scaling parameters that enable low-level transistor sizing and power-rail scaling forpowerdelay-area optimization. A 14mm2 programmable decoder core for a rate-f, Icngtti 2048 AA-LDPC code generated using the proposed methodology is presented, which delivers B throuphwt of I. 6 G b~s at 125MHz and consumes 760mW of power.
IEEE Transactions on Circuits and Systems I: Regular Papers, 2010
A low-complexity message-passing algorithm, called Split-Row Threshold, is used to implement low-density parity-check (LDPC) decoders with reduced layout routing congestion. Five LDPC decoders that are compatible with the 10GBASE-T standard are implemented using MinSum Normalized and MinSum Split-Row Threshold algorithms. All decoders are built using a standard cell design flow and include all steps through the generation of GDS II layout. An = 16 decoder achieves improvements in area, throughput, and energy efficiency of 4.1 times, 3.3 times, and 4.8 times, respectively, compared to a MinSum Normalized implementation. Postlayout results show that a fully parallel = 16 decoder in 65-nm CMOS operates at 195 MHz at 1.3 V with an average throughput of 92.8 Gbits/s with early termination enabled. Low-power operation at 0.7 V gives a worst case throughput of 6.5 Gbits/s-just above the 10GBASE-T requirement-and an estimated average power of 62 mW, resulting in 9.5 pJ/bit. At 0.7 V with early termination enabled, the throughput is 16.6 Gbits/s, and the energy is 3.7 pJ/bit, which is 5.8 lower than the previously reported lowest energy per bit. The decoder area is 4.84 mm 2 with a final postlayout area utilization of 97%. Index Terms-Full parallel, high throughput, low-density parity check (LDPC), low power, message passing, min sum, nanometer, 10GBASE-T, 65-nm CMOS, 802.3an. I. INTRODUCTION S TARTING in the 1990s, much work was done to enhance error-correction codes to where communication over noisy channels was possible near the Shannon limit. Defined by sparse random graphs and using probability-based message-passing algorithms, low-density parity-check (LDPC) codes [1] became popular for their error-correction and near-channel-capacity performances. At first, neglected since its discovery [2], advances in VLSI have given LDPC a recent revival [3]-[6]. LDPC has relatively low error floors, as well as better error performance with large code lengths, and as a result, they have been adopted as the forward error-correction method for many recent standards, such as digital video broadcasting via satellite (DVB-S2) [7], the WiMAX standard for microwave communications (802.16e) [8], the G.hn/G.9960 standard for wired home networking [9], and the 10GBASE-T standard for 10-Gbit Ethernet (802.3an) [10]. While there has been much Manuscript
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
This paper introduces a new approach to costeffective, high-throughput hardware designs for Low Density Parity Check (LDPC) decoders. The proposed approach, called Non-Surjective Finite Alphabet Iterative Decoders (NS-FAIDs), exploits the robustness of message-passing LDPC decoders to inaccuracies in the calculation of exchanged messages, and it is shown to provide a unified framework for several designs previously proposed in the literature. NS-FAIDs are optimized by density evolution for regular and irregular LDPC codes, and are shown to provide different trade-offs between hardware complexity and decoding performance. Two hardware architectures targeting high-throughput applications are also proposed, integrating both Min-Sum (MS) and NS-FAID decoding kernels. ASIC post synthesis implementation results on 65nm CMOS technology show that NS-FAIDs yield significant improvements in the throughput to area ratio, by up to 58.75% with respect to the MS decoder, with even better or only slightly degraded error correction performance.
IEEE Journal of Solid-State Circuits, 2006
A 14.3-mm 2 code-programmable and code-rate tunable decoder chip for 2048-bit low-density parity-check (LDPC) codes is presented. The chip implements the turbo-decoding message-passing (TDMP) algorithm for architecture-aware (AA-)LDPC codes which has a faster convergence rate and hence a throughput advantage over the standard decoding algorithm. It employs a reduced complexity message computation mechanism free of lookup tables, and features a programmable network for message interleaving based on the code structure. The chip decodes any mix of 2048-bit rate-1/2 (3,6)-regular AA-LDPC codes in standard mode by programming the network, and attains a throughput of 640 Mb/s at 125 MHz for 10 TDMP-decoding iterations. In augmented mode, the code rate can be tuned up to 14/16 in steps of 1/16 by augmenting the code. The chip is fabricated in 0.18-m six-metal-layer CMOS technology, operates at a peak clock frequency of 125 MHz at 1.8 V (nominal), and dissipates an average power of 787 mW. Index Terms-Architecture-aware low-density parity-check (AA-LDPC) codes, iterative decoders, LDPC codes, turbodecoding message-passing (TDMP) algorithm, VLSI decoder architectures.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.