Abstract

Low-density parity-check (LDPC) codes have become the focal choice for next-generation Internet of things (IoT) networks. This correspondence proposes an efficient decoding algorithm, dual min-sum (DMS), to estimate the first two minima from a set of variable nodes for check-node update (CNU) operation of min-sum (MS) LDPC decoder. The proposed architecture entirely eliminates the large-sized multiplexing system of sorting-based architecture which results in a prominent decrement in hardware complexity and critical delay. Specifically, the DMS architecture eliminates a large number of comparators and multiplexors while keeping the critical delay equal to the most delay-efficient tree-based architecture. Based on experimental results, if the number of inputs is equal to 64, the proposed architecture saves 69%, 68%, and 52% area over the sorting-based, the tree-based, and the low-complexity tree-based architectures, respectively. Furthermore, the simulation results show that the proposed approach provides an excellent error-correction performance in terms of bit error rate (BER) and block error rate (BLER) over an additive white Gaussian noise (AWGN) channel.

1. Introduction

Internet of things (IoT) will be one of the major trends in next-generation wireless networks for connecting billions of devices to the Internet [14]. These communication devices will provide a high data rate with low transmission delay and energy consumption [58]. In this regard, low-density parity-check (LDPC) codes [915] are one of the most promising candidates in the list of error-control codes and adopted as a primary choice for next-generation IoT networks [1619]. Compared to other error-correction codes, like Bose-Chaudhuri-Hocquenghem (BCH) codes, Reed Solomon (RS) codes, and turbo codes, LDPC codes have many advantages, e.g., very low error floor, high-speed encoder and decoder, and more varieties in code construction [2023]. Therefore, LDPC codes have become the focal choice for many communication standards, such as 10-Gigabit Ethernet (802.3an) [24] and Wi-Fi (802.11n/ac/ad) [2527].

To obtain an optimal performance, LDPC codes are usually decoded with an iterative process between the two decoding phases, i.e., check-node update and variable-node update. Among various decoding algorithms, sum-product (SP) [28] algorithm provides a tremendous decoding performance close to Shannon capacity. However, it suffers from large complexity because of logarithmic and multiplicative functions involved in CNU operation. For hardware implementation of decoder, an area-efficient approximation of SP called min-sum (MS) [29] algorithm was proposed which provides implementation advantages over SP algorithm by computing two minimum values from a set of messages arriving at check nodes. But it suffers from performance degradation. The normalized min-sum (NMS) and offset min-sum (OMS) [30], modified versions of MS, significantly improve the performance of MS by introducing additional normalization and offset factors, respectively.

In hardware implementation of MS decoder, each iteration involves two operations, i.e., CNU and variable-node update (VNU). For CNU, a minimum-value unit (mvu), also called minimum finder, is required to estimate the first two minima and index of the first minimum value. For large block-length LDPC codes required in high data rate applications, a huge number of minimum-value units are needed to estimate the first two minima and index of which significantly increases the complexity of CNU operation. Existing methods require circuitry with high complexity in terms of comparators, multiplexors, latency, and area time. Thus, a low cost algorithm is greatly desired to reduce the complexity of CNU operation of MS decoder.

Recently, some attempts have been utilized to estimate the first two minima from a set of messages arriving at check node. In [31], a single minimum min-sum (smMS) algorithm was proposed which only computes the absolute minimum value and the second minimum value is computed by adding a corrective constant in the first minimum. The smMS provides a significant reduction in hardware complexity of CNU processor, but it suffers from performance degradation. Wang et al. proposed a modification factor min-sum (mfMS) algorithm in [32]; the mfMS algorithm improves the performance of smMS by introducing a modification factor in absolute minimum value. Zhang et al. used the mfMS approach to design a flexible LDPC decoder for multigigabit per second applications [33]. A variable-weight min-sum (vwMS) algorithm was proposed by Angarita et al. in [29] by introducing a variable iteration-based correction factor; the performance of vwMS is better than smMS and mfMS. A simplified variable-weight min-sum (svwMS) is also proposed in [29] which requires low computational cost to determine if more than one input message shares the same first minimum value. In [29, 3133], the absolute minimum value is calculated first, and then, the second minimum is estimated by applying a modification or correction factor to absolute minimum value. Researches have also investigated various problems on the other related topics of communications [3444].

Besides the single minimum-based algorithms, some efforts have been made to propose architectures which compute the two minima from a set of messages for CNU operation [4549]. A sorting-based architecture was proposed by Xie et al. in [46] for finding two minima, but it suffers from large critical delay. Chen-Long et al. proposed a tree-based architecture in [47] which requires some additional complexity but provides critical delay less than that of sorting-based architecture. A low-complexity tree-based architecture [48] was proposed by Lee et al. which reduces some hardware complexity of tree-based structure while keeping the critical delay between those of the sorting-based and tree-based architectures. This manuscript presents an efficient approach, known as dual min-sum (DMS) architecture, for finding the first two minima and from a set of variable nodes participating in CNU operation. Compared to existing sorting-based and tree-based architectures, the proposed scheme efficiently eliminates a large number of comparators and multiplexors while keeping the critical delay almost equal to the tree-based architecture. Based on experimental results, if the number of inputs is equal to 64, the proposed architecture saves 69%, 68%, and 52% area over the sorting-based, tree-based, and low-complexity tree-based architectures, respectively. Furthermore, the simulation results show that the proposed approach outperforms its counterparts by providing an excellent error-correction performance close to NMS algorithm over an additive white Gaussian noise (AWGN) channel.

The remainder of this correspondence is arranged as follows. In Section 2, the basic concepts about LDPC codes and min-sum decoding are given. A detailed review of the state-of-the-art architectures for finding the first two minima is given in Section 3. Section 4 presents a proposed architecture to find the first two minima for CNU operation of min-sum LDPC decoder. The performance analysis and hardware implementation of the proposed architecture are given in Section 5, and the conclusion of this correspondence is presented in Section 6.

2. Min-Sum LDPC Decoding

An LDPC code can be described by the null space of a sparse parity-check matrix , where denotes to the number of parity-checks and denotes to the block length of code. It can also be specified by a bipartite graph or Tanner graph having check nodes and variable nodes. The check nodes specify the rows of and variable nodes specify the columns of . The degree of check node is equal to the number of nonzero entries in a row of , and the degree of variable node is equal to the number of nonzero entries in a column of .

Let denote the set of variable nodes involve in check node and denote the set of check nodes connected to variable node . Also, let represent the set with excluding the variable node and set represents exclusion of check node from the set . The log-likelihood ratio (LLR) for a random variable can be defined as , where represents the probability of transmitted bit being equal to zero. In addition, let denote the LLR message for bit , sent from variable node to check node in the th iteration. Similarly, denotes the LLR message for bit , sent from check node to variable node in the th iteration. Finally, and denote the transmitted and the received codewords, respectively. Also, let us assume that denote the intrinsic reliability provided by the channel. The MS decoding consist of the following steps: (1)Initialize , where represents the maximum number of iterations(2)Initialize , , (3)VNU function: , (4)CNU function: , (5)Hard decision: applying a hard decision to compute the transmitted sequence as If or the maximum number of iteration is reached, move to Step 6; otherwise, set and go back to Step 3(6)Output: declare the estimated sequence as the decoder output

As compared to conventional SP and NMS algorithms, although the performance of MS algorithm is lower, it requires much simpler hardware circuitry for CNU operation performed in check-node update processor. In practical implementation of MS decoder, instead of finding the minimum value in (2), two minimum values are computed from the set of messages arriving at check node and a suitable one is selected depending upon the information received at the check node. Thus, the MS decoder reduces the hardware complexity and provides implementation advantages in terms of area and delay. In the next section, we introduce some existing architectures to find the first two minima for CNU operation of MS decoder.

Generally, the hardware circuit used to find the first two minima from a set of messages arriving at check node is known as search module (SM). Let, for a given set of -bit messages received at check node, ; SM generates three outputs: (1) the first minimum value of set , (2) the second minimum value of , and (3) the index of the first minimum value. For hardware realization, two 2-input units, and , are used as the fundamental units of a search module. , as shown in Figure 1(a), consists of one comparator and one -bit 2-to-1 multiplexor and it returns the smaller value from two inputs. consists of one comparator and two -bit 2-to-1 multiplexors, and it returns both smaller and larger values, as depicted in Figure 1(b). Also, assume inputs of SM be a power of 2, i.e., . If is not a power of 2, then such SM can be obtained by pruning some leaf nodes of the balanced SM having inputs as described in previous literatures [4547]. Next, we present some state-of-the-art architectures to find the first two minima and index of the first minimum value.

The sorting-based SM architecture for eight inputs is depicted in Figure 2. The overall process of sorting-based SM is partitioned into two steps: (1) is computed with the binary search tree and (2) an index-controlled multiplexing system is used to compute . In Figure 1(c), the index of can be estimated from comparison results. A set of candidates, , is computed by the multiplexing system which employs three 8-to-1 multiplexors to estimate the value of . Once the set is in hand, two are required to compute . Consequently, the sorting-based SM requires nine 2-to-1 multiplexors, nine comparators, and three 8-to-1 multiplexors for processing eight inputs. But it causes the long critical delay due to serially connected multiplexing system.

The sorting-based architecture is not feasible for high-speed applications because it induces a large critical delay due to serially connected multiplexing system. A tree-based architecture, as depicted in Figure 3, was proposed in [47] for high-speed realization. In tree-based SM, and have almost the same processing time due to the hierarchical tree architecture. Compared to sorting-based SM, it requires more comparators and multiplexors for finding . Three and one 2-to-1 multiplexor are additionally required for combining two subtrees. But the serially connected multiplexing system is completely removed which reduces the critical delay.

The tree-based architecture provides implementation advantages over sorting-based architecture in terms of critical delay, but it is not cost-effective for large block-length LDPC codes. Thus, it has higher hardware complexity that arises from large number of comparators and multiplexors. A low-complexity tree-based architecture was proposed in [48] which reduces the number of comparators while keeping the critical delay between those of the sorting-based and tree-based architectures. A low-complexity tree-based SM, referred to as , for eight inputs is depicted in Figure 4 where a unit provides a candidate set, , for finding . A tree structure composed of two is required to find from candidate set . requires nine comparators and twenty 2-to-1 multiplexors to process eight inputs. Therefore, the existing sorting-based and tree-based search modules are not cost-effective for large block-length LDPC codes. Hence, a low-cost SM architecture is greatly needed for hardware implementation of MS-LDPC decoder. Next, we present SM, known as DMS architecture, which reduces the hardware complexity of MS decoder for large block-length LDPC codes.

4. Proposed Architecture

The complexity of comparators and multiplexors is considerable for hardware realization of the MS-LDPC decoder. A DMS-based SM is presented which reduces a large number of comparators and multiplexors while keeping the critical delay almost equal to the tree-based architecture. The proposed SM is conceptually similar to sorting-based SM. But the serially connected multiplexing system for finding is completely removed which reduces the hardware complexity and critical delay. The proposed DMS-based SM estimates the value using a logical unit, as depicted in Figure 5. The complexity and delay of logical unit are much less than those of the serially connected multiplexing system. The hardware complexity of both the proposed and sorting-based architectures is the same to find . But the DMS-based SM estimates the using a logical unit which reduces the hardware complexity.

The DMS-based SM for eight inputs is depicted in Figure 5, where seven comparators and seven 2-to-1 multiplexors are required to find . The logical unit, as depicted in Figure 6, requires two adders, one right-shift register, and one AND gate for estimating . The first step of DMS approach is to replace the CNU function in (2) with

In other words, the sign and output magnitudes are estimated from all variable nodes arriving at check node . The next step is to find the first two minimum values for CNU operation. Let and denote and , respectively. The magnitude of check-node output is computed as where and denote the variable nodes participating in the last of DMS architecture. Thus, the DMS architecture reduces the hardware complexity for CNU operation of the MS-LDPC decoder.

Input: a set of positive values.
for: do
Step 1
Partition set into pairs of values and find the minimum value of each pair. Continue partitioning, and find from the last pair of values.
Step 2
Input the last pair of values in Step 1 to logical unit, and estimate .
end for
Output: =

As an illustrative example, assume a set of eight input values, . Based on Step 1 of the DMS algorithm, set as partitioned into pair of values as . Finding the minimum value of each pair, a subset is obtained as . Again, partitioned subset into a pair of values as . Finding the minimum value of each pair, we obtain the last pair of values as which returns the first minimum value as . According to Step 2 of the DMS algorithm, the last pair of values, , is passed to the logical unit for finding . Based on (5), can be estimated as . Afterward, the DMS algorithm returns the output as . It is important to mention that the DMS algorithm returns which is always the first minimum value of set , but it returns which is the estimated second minimum value among the values of ; it may or may not be the exact second minimum value. Consequently, the DMS algorithm provides an efficient architecture which is more cost-effective for large block-length LDPC codes.

5. Experimental Results

5.1. Performance Analysis

In this section, the error-correction performance of the proposed DMS approach in terms of bit error rate (BER) and block error rate (BLER) is compared with its counterparts under the same conditions. The standard IEEE802.16e LDPC codes with code rates 0.5 and 0.75 having a block length of 2304 are used for evaluating the performance of the proposed and some other existing algorithms. The performance of the proposed approach is compared with the NMS, mfMs, svwMS, and exMin- [49] algorithms with maximum number of decoding iterations equal to 50. Binary phase-shift keying (BPSK) transmission is assumed over an AWGN channel. Figures 7 and 8 depict the performance analysis for the (2304, 1152) and (2304, 576) IEEE802.16e LDPC codes.

Figure 7 compares the error-correction performance of the proposed DMS algorithm with NMS, svwMS, and exMin-, for . Numerical results show that the DMS algorithm provides an excellent error performance close to the NMS algorithm with code rate 0.5 and code length of 2304 for IEEE802.16e standard LDPC code. At a BER of , the DMS algorithm performs very close to NMS with a degradation of 0.09 dB. On the other hand, the exMin-2 and svwMS algorithms perform with a degradation of 0.20 dB and 0.30 dB, respectively.

Similarly, the error-correction performance of the DMS algorithm is also compared with NMS, mfMS, and exMin-, for , for IEEE802.16e standard LDPC code with code rate 0.75 and a code length of 2304. Figure 9 reveals that the DMS algorithm performs close to the NMS algorithm with a degradation of 0.06 dB at BER of . But the exMin-3 and mfMS algorithms provide a performance loss of 0.22 dB and 0.26 dB, respectively. As a result, the proposed DMS algorithm outperforms its counterparts under the same conditions by providing an error-correction performance very close to the NMS algorithm.

5.2. Complexity and Speed Performance

As compared to the state-of-the-art architectures [4648], the proposed DMS architecture reduces the computational complexity for CNU operation of the MS-LDPC decoder. According to Table 1, a comparison of the hardware complexity and critical delay of DMS architecture with sorting- and tree-based architectures is shown, where , , , and denote the delay of comparator, multiplexor (2-to-1), multiplexor (-to-1), and logical unit, respectively. The sorting-based [46] and low-complexity tree-based [48] architectures require comparators, and the tree-based [47] architecture requires comparators to find the first two minima. As the DMS architecture completely removes the multiplexing system inevitable for sorting-based SM, it requires comparators for finding two minima. The sorting-based SM requires 2-to-1 and -to-1 multiplexors, where the tree- and low-complexity tree-based architectures require comparators to find the first two minima. But the DMS architecture requires multiplexors for finding and . Also, the DMS architecture additionally requires two adders, one right-shift register, and one AND gate for the implementation of logical unit, but it keeps the critical delay almost equal to that of the tree-based architecture. Consequently, if the number of input values is equal to 16, for example, the DMS architecture eliminates 16.66% comparators compared with the sorting-based and low-complexity tree-based architectures and 48.27% comparators compared with the tree-based architecture. Also, the proposed architecture requires 65.90% less multiplexors compared with the tree-based and low-complexity tree-based architectures.

For fair comparison, four types of architectures are implemented in 6-bit CMOS standard cell library process: the sorting-based [46], tree-based [47], low-complexity tree-based [48], and proposed DMS architectures. Figure 9 depicts the critical delay for four architectures against different numbers of inputs. To the best of our knowledge, the tree-based [47] architecture is assumed to be the best architecture in literature for high-speed realization. Figure 9 shows that the critical delay of the DMS architecture is almost the same as that of the tree-based [47] architecture.

The most area-efficient architecture was proposed by Lee et al. in [48]. Figure 10 shows that when is equal to 6, the proposed architecture saves 69%, 68%, and 52% area over the sorting-based, tree-based, and low-complexity tree-based architectures, respectively. Consequently, the proposed architecture is proved to be the most area-efficient architecture for high-speed realization. Consequently, the DMS architecture reduces the hardware complexity of the MS-LDPC decoder for CNU operation.

6. Conclusion

An efficient approach has been proposed to find the first two minima for CNU operation of the MS-LDPC decoder. The proposed architecture is conceptually similar to the sorting-based architecture, but it completely removes the large-sized multiplexing system which results in a prominent reduction in hardware complexity and critical delay. The proposed architecture estimates the second minimum value by utilizing a logical unit circuit having complexity and delay less than those of the multiplexing system. Based on the experimental results, the proposed architecture provides a critical delay almost the same as that of the tree-based architecture. More specifically, the proposed SM eliminates a large number of comparators and multiplexors for CNU operation of the MS-LDPC decoder. Therefore, the DMS architecture saves 69%, 68%, and 52% area over the sorting-based, tree-based, and low-complexity tree-based architectures, respectively. Furthermore, simulation results show that the proposed approach outperforms its competitors in terms of bit error rate (BER) and block error rate (BLER) by providing an excellent error-correction performance over an AWGN channel.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that they have no conflicts of interest.