Abstract
In 4G turbo and 5G LDPC, in order to realize a flexible, low-power, low-cost shared general-purpose block interleaving hardware module, it faces the challenges of interleaving structure integration, fewer gate circuits, parallel multistream operation, and switching between standards. Facing these challenges, after studying 3GPP TS 36.212 V15.4.0 and 3GPP TS 38.212 V15.4.0 protocols, common part of two major interleaving module standards is found. For the block interleave module in the rate matching of the 4G downlink turbo code and the bit interleave module after the rate matching of the 5G NR downlink LDPC code, this paper first designs a memory and implements the two codes interleaving on it. Then, based on the Altera Quartus prime platform and ModelSim for functional verification. Experimental results show that under SMIC 28 nm, operating frequency 50MHz, after synopsys synthesis, the memory module area is , and the power consumption is 6.45 mW. Through the shared design, 32 bits parallel access, and switching between standards, the proposed scheme reduces the hardware cost, power consumption, and clock overhead, and improves the flexibility of 4G LTE and 5G NR communication downlink hardware implementation.
1. Introduction
In the process of wireless transmission, communication link transmission errors caused by channel characteristics and noise, the code stream generated by coding will have pieces of error bits, resulting in the data received by the receiver being unable to recover the source signal by means of error correction [1, 2]. In order to minimize the decoding problems caused by continuous errors, error control technology, and interleaver are usually used in practical applications [3]. In detail, error control coding is used to convert blocks of information containing data bits into longer code blocks for channel transmission. The interleaver scrambles the bit order of the coded block and outputs. The error control technology is used to eliminate random errors, and interleavers are used to break up encoded bit sequences. By interleaving, the connection between adjacent bits can be eliminated, and the error bits in blocks can be dispersed to minimize the influence of factors such as burst noise during transmission [4].
In order to improve the reliability of transmission, interleaving is essential in the communication link. In fact, the implementation of the interleaver consumes a lot of silicon. Implementing multiple interleavers for multiple standards will greatly increase the cost of silicon [5]. Therefore, it is of great significance to design a set of sharable hardware modules, integrate the interleave structure of different standards, reduce gate circuits, and realize switching between standards. In this paper, we design a shared memory interleaver to implement turbo code and LDPC code channel interleaving. The functions of the interleaver between the two codes are concentrated in a single architecture for shared implementation, which saves silicon costs. The switching of different standards provides a new choice for the baseband processor, which improves the flexibility of the entire system to a certain extent.
1.1. Related Work and Motivation
The theoretical basis of interleaving and deinterleaving algorithm research is quite mature. From the perspective of different standards, in WiMAX (world interoperability for microwave access), WLAN (wireless local area network), 802.11n, HSPA (high-speed packet access), DVB-H (digital video broadcasting handheld), 3GPP-LTE (long term evolution), and 5G NR (new radio) and other standards use channel interleaving technology. In foreign countries, Khater et al. proposed a pattern tracking implementation method for WiMAX channel interleaving. The interleaving under the WiMAX standard is implemented by FPGA (field programmable gate array), which has improved area and delay. However, this method is limited in its application scope [6]. In addition to the realization of single standard interleaving, Rizwan Asghar proposed the realization of channel interleaving under WiMAX, DVB, and 3GPP-LTE standards, and realized multimode interleaving according to the method of hardware sharing [7]. At the same time, a parallel interleaving algorithm based on HSPA+, DVB-SH, 3GPP-LTE, and WiMAX is proposed, which greatly saves clock overhead [8, 9]. In China, Zhang [10] studied the permutation of interleave and deinterleave in the 802.11n protocol and proposed a hardware architecture for 36 interleaving structures under the protocol. The hardware cost and address generation of MIMO transceiver in 802.11n are solved effectively. Wang [11] proposed a configurable interleave module that can realize WiMAX/WLAN, 802.11n, HS-DSCH (high-speed downlink shared channel), and DVB-H multiple standards. After system simulation verification, this module improves the reliability of the transmission and the flexibility of the system and saves a certain amount of hardware resources.
From the perspective of different encoding methods, in [12], intrasymbol interleave of MNBTC (multi-non-binary turbo codes) is studied, some possible methods are proposed, and the relationship between intrasymbol interleave and BER/SNR is summarized. Aiming at the characteristics of RS (Reed–Solomon) codes, Yi et al. [13] proposed an improved block successive packing (BSP) interleaving algorithm. When combined with RS codes, this algorithm improves the robustness of the system. The polar and LDPC codes used in 5G standards are also studied in corresponding papers. In a study by U.U. Fayyaz [14], a bit-interleave symbol mapping design is made for a polar code modulation system, and simulation shows that the proposed mapping scheme achieves fast convergence of regional errors. For LDPC codes, a higher-order bit interleave scheme is proposed, which reduces the bit error rate (BER) of the system compared with traditional maximum likelihood decoding [15]. We have previously proposed an interleave multiplexing scheme of LDPC code and polar code that multiplexes computing units with the same functions, improving the flexibility of the 5G link and saving silicon area [16].
From the classification of interleaving, Zou et al. [17] improve block interleave to some extent for iterative decoding. For block interleave, the traditional solution is to use two RAMs, one for sequential storage and the other for reading, using a different addressing scheme. In [17], the author replaces simple row and column addressing with an efficient address generation unit and uses only one RAM to complete the block interleaving process. For FPGA implementation, it can effectively reduce hardware resources. Facing the requirements of high throughput and low delay of 5G NR, Behera [18] studied the delay generated in the process of LDPC code rate matching and bit interleaving and proposed the M parallel pointer generation algorithm for the rate matching buffer, generating M parallel pointers in advance. This algorithm skips row and column permutations and reduces the clock delay in the downlink.
In conclusion, the existing interleave algorithms have realized interleaving in code and interleaving in the channel under different standards. Moreover, the existing schemes also fuse different classes of interleavers into one module, introducing parallel processing units if necessary. However, in the existing research, there is not yet one available for 4G and 5G downlink channel communication. When considering the implementation of 4G and 5G shared storage interleaving modules, the different interleaving algorithms, the generation of row and column addresses, and the clock delay brings great challenges.
1.2. Main Contributions of Paper
In response to the above problems, this article analyzes the commonality of the block interleave in the turbo code rate matching in the 4G downlink and the bit interleave of the LDPC code in the 5G downlink. Firstly, the addressing formula for turbo codes with arbitrary code length is proposed. After block interleaving in rate matching, NULL bits are punched. Secondly, the corresponding interleave algorithm is proposed for LDPC codes under different modulation orders, and the corresponding address conversion is performed. Meanwhile, we design address generation unit and storage module, so that the coded turbo code and the rate-matched LDPC code use the same hardware to realize interleaving function, respectively, and conduct simulation verification. Finally, the simulation results also prove the feasibility and superiority of our proposed scheme.
1.3. Paper Organization
The rest of this paper is organized as follows. Section 2 presents a brief introduction to out-of-code interleaving. In Section 3, we present the implementation process, hardware architecture, instruction design, and circuit framework of a shared memory interleaver. Meanwhile, the specific implementation schemes of turbo code sub-block interleaving and LDPC code bit interleaving under this module are given, respectively, and integrate the turbo code addressing formula and the LDPC code interleaving algorithm. The performance of the proposed hardware is evaluated by simulation results in Section 4. Finally, Section 5 concludes this paper.
2. A Brief Introduction to Turbo and LDPC Code Interleaving
2.1. Turbo Code Rate Matching and Interleaving
In LTE system, a turbo coding scheme is adopted for shared channels (BCH), paging channels (PCH), and multicast channels (MCH). Turbo encoding has a 1/3-bit rate and three-bit streams of output. Internal interleaving in encoding has been realized in [7, 19]. This paper mainly studies and optimizes the rate matching part in LTE downlink communication. Rate matching is to repeat, puncture, or add NULL bits to the coded bits to complete the rate that the physical channel can support. LTE is divided into convolutional coding and turbo coding according to different coding methods. Based on 3GPP TS 36.212 [20], this paper studies the rate matching process of turbo codes in the downlink, summarizes the block interleave addressing formula, and designs storage units for hardware implementation.
According to [20], after turbo encoding, three-bit streams are output and enter three sub-block interleavers, respectively. The turbo code rate matching process is shown in Figure 1. For the 0th and 1st sub-block interleaving, the bit stream enters into a row with the number of interleaver columns of 32. After data enters the interleaver, intercolumn permutation is performed according to Table 1, where P(j) is the original column position of the j-th column. In addition, when the length of the encoded bit stream is not an integer multiple of 32, we fill the corresponding number of NULL bits in front of the first bit to fill the first row of the interleaver.

For the second sub-block interleaving, the address mapping formula is formula (1), where the input bit sequence is dk (2) and the output sequence is vk (2). The three-bit streams enter the bit collection after the block interleaving to collect, select, and transmit bits and complete the rate matching.
In (1), , is the number of rows of sub-block interleaver, k is the bit index, denotes a round-down operation, and is the output bits address index.
2.2. LDPC Bit Interleaving
The position of the bits interleave of LDPC code in the downlink communication is shown in Figure 2 The physical layer receives a transport block from the MAC layer and first adds 16 or 24 bits of CRC. If the code exceeds a certain value, the number of bits in the block contains a code block segmentation. A CRC is added to each segmented code block, and each code block is encoded independently for rate matching. The LDPC code realizes rate matching through circular buffering. According to 3GPP TS 38.212 [21], the encoded bits are put into the circular buffer for bit selection. The starting point of bit selection is related to the redundant version number rvid (rvid = 0,1,2,3). A bit is read sequentially from the circular buffer according to the redundant version during transmission.

Bit selection is followed by bit interleaving, which disrupts the sequence of the bit stream after bit selection. Interleaving is carried out separately for each code block. 5G NR determines the use of row-column interleavers for bit interleaving. According to [22], the interleaving method is as follows: a rectangular interleaver is used, the number of rows of the interleaver is R, and R is also the modulation order (for QAM, the modulation order is 2, for 16QAM, the modulation order is 4, and for 256QAM, the modulation order is 8). The interleaver reorders the data according to the way of row writing and column reading. The realization of bit interleaving is shown in Figure 3.

3. Proposed Scheme
3.1. Interleaving Algorithms of Turbo and LDPC
For turbo code sub-block interleaving, each clock processes 32 bits. When data enters the register, the register is divided into multiple memory banks, and the 32 bits are put into 32 banks, respectively. Carry out a cyclic shift from the first row, reading out 32 bits from different memory banks. In order to realize the permutation in Table 1, the address is generated by the address generating unit, and data is read from 32 addresses according to the address and written into the data memory module. For formula (1) interleaving, address mapping can be realized when the last column is cyclically shifted upward one bit when the data column displacement is read.
For LDPC block interleaving, the bit stream of length E enters the block interleaver. The number of rows in the block interleaver is Qm (1, 2, 4, and 8), the number of columns is E/Qm, and the bits enter by row and output by column. According to the interleaving algorithm in [21], the implementation is shown in algorithm 1. Where E is the bit length after encoding, Qm is the modulation order, e is the preinterleaving sequence, and f is the postinterleaving sequence.
|
The size of the SRAM is , and in any case, the modulation order Qm is 1, 2, 4, and 8. The data will be written into the SRAM and read out correctly.
In order to be consistent with the order of the standard, E bits are divided into groups of 32 bits each when the modulation order is 1 (i.e., the number of rows is 1), which are successively written into the SRAM and read out sequentially. When the modulation order is 2,4, and 8, respectively, the position of the coded sequence is transformed. The position correspondence is shown in algorithm 2, where t is the bit sequence after position replacement.
|
3.2. Design and Implementation of Interleaver Memory Sharing
According to the input data size, the hardware is designed according to the maximum data. For turbo codes, the size of each code block is 6144 bits. We complete the interleaving of 3-bit streams on a hardware module, and the maximum data is 6144 × 3 = 18432 bits. The block interleaver column number is 32; one row writes 32 bits, and up to 576 rows are written to memory. For LDPC code, the code length is N, where N = 66Z (BG1), N = 50Z (BG2), and the maximum expansion factor Z is 384, so the maximum length is 66 × 384 = 25344. Memory is designed for maximum encoding length. In order to realize turbo sub-block interleaving and LDPC bit interleaving through a set of address generating units and memory units, the main module is set to 800 × 32 size SRAM. This SRAM block has 800 addresses; each address can access 32 bits. The realization process is divided into two phases: the precomputation phase and the execution phase. According to different codes, first, we determine the relevant parameters through the precomputation stage, and then enter the execution stage, as shown in Figures 4 and 5.


3.2.1. Hardware Architecture Design
The hardware module includes control logic and a storage system. The control system includes program memory (PM), an instruction decoder (ID), and the storage system includes an address generating unit (AGU), a data permutation network (DPN), and data memory (DM). The connection of different modules is shown in Figure 6.

3.2.2. Instruction Design
Using the memory sharing module to process the interleaving of turbo and LDPC code, it is necessary to design corresponding instructions according to the precomputation stage and the execution stage to perform different operations on the hardware. Before turbo sub-block interleaving, there is a filling instruction; data processing first goes through a 32 × 32 row and column permutation network, and then enters the data memory to be read out according to the corresponding order. Here, a loop instruction is designed to set the number of cycles of 32 × 32 SRAM. Figure 7 shows the description of the two instructions.

(a)

(b)
3.2.3. Hardware Structure
The hardware includes an address generation module, a data processing module, and a memory module. The address generating module generates read and write addresses according to the different control signals. When data is written, the address generated by the address generation module is respectively given to the data processing module and the storage module. The former is used to achieve row and column permutation, and the memory module stores the processed bits according to the address. When data is read, the address generating unit generates different read addresses according to the control signal and reads corresponding data from the memory module. The hardware diagram is shown in Figure 8.

3.3. Interleaving and Rate Matching of Any Code Length
For turbo codes, when the bit stream enters the block interleaver, the index is i; after the block interleaver, it enters the bit collection, and the index is j. The corresponding relationship between i and j is as follows: where & stands for bit and operations, ND is the number of bits filled, and RTC is the number of rows of sub-block interleaver. represents a round-down operation, P[ ] can be calculated according to Table 1.
Sub-block 0:
Sub-block 1:
Sub-block 2:
In the precomputation stage, the number of NULL bits to be filled ND and the row number of block interleaver RTC are determined. The size of the block interleaver is determined according to two parameters: the row number of the block interleaver RTC and column number 32. After interleaving, we set the ND value. According to formulas (2)–(4), the index position of the filled NULL bit after interleaving can be obtained so as to punch the NULL bit.
For LDPC codes, in the precalculation stage, the parameter bit length E and the modulation order Qm are determined. Then, according to the addressing algorithm, the data is written into the memory and read out according to the corresponding read address to complete the block interleaving. By comparing the addressing formulas of turbo and LDPC codes, the units with the same address operation can be merged and realized. The specific scheme is shown in Figure 9.

(a)

(b)

(c)
4. Simulation Results and Discussions
4.1. Experimental Environment and Evaluation Metrics
The hardware design of this paper was first simulated on MATLAB, and then based on the Quartus prime platform, described by Verilog and System Verilog, and completed waveform simulation and functional verification through ModelSim. Through SMIC 28 nm digital CMOS technology, the operating frequency of 50 MHz, synopsys synthesis, and finally getting the hardware memory module area overhead, corresponding power consumption, and the throughput of the two encodings.
4.2. Simulation Results
Based on Quartus prime, hardware description using Verilog and System Verilog, functional simulation using ModelSim, and observing the waveform output as shown in Figure 10. For turbo code, we set a certain input and observe the output waveform after SRAM and row-column interleaving. It can be seen from Figure 10(a) that the output port can output correctly and orderly at the rising edge of the clock at the position specified by the yellow line. Similarly, for the LDPC code, from Figure 10(b), we can see that the output port can also correctly output the bit sequence under the action of the control port and the corresponding enable port.

(a)

(b)
For turbo codes, the data is processed in sequence into the SRAM and output at the corresponding address, with an average of 16 bits of data processed per clock. For LDPC codes, each clock processes 32 bits. Under the control of the enable signal, the rising edge of the clock correctly reads the corresponding bits from the data memory. Through synopsys layout design synthesis, the generated layout is shown in Figure 11. Only the memory module is synthesized, with an area of , other relevant parameters are shown in Table 2.

4.3. Discussions
Table 3 shows the implementation of block interleaving for different standards in other papers. In a study by Han et al. [23], a new architecture for block interleaving under the MB-OFDM standard in UMB is proposed, which is implemented through an FPGA, with a maximum clock frequency of 500 MHz and total power consumption of 294.21 mW. In the study by Ma and Lin [24], the whole rate matching part of turbo codes is realized by FPGA, and only the memory module of the block interleaving part is realized by hardware synthesis in this paper. In addition, the author proposes a hardware implementation scheme for 3GPP turbo code rate matching. Our design not only increases the bit interleaving of LDPC codes but also reduces the area. In the study by Zhang et al. [10], the author performed permutation merging of the row write and column read of the interleaving module. In hardware implementation, only one read and write operation is required. In this paper, on the basis of this, one clock processes multiple bits, which improves the system throughput.
5. Conclusions and Future Works
In this experiment, the sub-block interleaving in the turbo code rate matching in the 4G LTE downlink and the LDPC code bit interleaving in the 5G NR downlink are implemented on a hardware module, mainly to realize the sharing of memory modules. Experimental results show that this method reduces the storage area, switches between the two coding interleavings, and improves the flexibility of the system. In future work, we will add the address fusion solutions of turbo code bit puncturing and LDPC code bit interleaving address fusion scheme to the hardware implementation, as well as the block interleaving of convolutional coding in LTE and the polar coding interleaving part used in the control channel in 5G NR. The hardware module is also added to improve the reconfigurability of the module so as to maximize the sharing of hardware circuits and realize the complete integration of 4G and 5G communication links.
Data Availability
This article does not cover data research. No data were used to support this study.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was funded by the Hainan Province Natural Science Foundation of China (Grant no. 620RC564), the National Natural Science Foundation of China (Grant nos. 61963012 and 62161010), and Hainan University project funding KYQD (ZR)1974. The authors would like to thank the referees for their constructive suggestions.