Abstract

Quantum information transfer is an information processing technology with high speed and high entanglement with the help of quantum mechanics principles. To solve the problem of quantum information getting easily lost during transmission, we choose topological quantum error correction codes as the best candidate codes to improve the fidelity of quantum information. The stability of topological error correction codes brings great convenience to error correction. The quantum error correction codes represented by surface codes have produced very good effects in the error correction mechanism. In order to solve the problem of strong spatial correlation and optimal decoding of surface codes, we introduced a reinforcement learning decoder that can effectively characterize the spatial correlation of error correction codes. At the same time, we use a double-layer convolutional neural network model in the confrontation network to find a better error correction chain, and the generation network can approach the best correction model, to ensure that the discriminant network corrects more nontrivial errors. To improve the efficiency of error correction, we introduced a double-Q algorithm and ResNet network to increase the error correction success rate and training speed of the surface code. Compared with the previous MWPM 0.005 decoder threshold, the success rate has slightly improved, which can reach up to 0.0068 decoder threshold. By using the residual neural network architecture, we saved one-third of the training time and increased the training accuracy to about 96.6%. Using a better training model, we have successfully increased the decoder threshold from 0.0068 to 0.0085, and the depolarized noise model being used does not require a priori basic noise, so that the error correction efficiency of the entire model has slightly improved. Finally, the fidelity of the quantum information has successfully improved from 0.2423 to 0.7423 by using the error correction protection schemes.

1. Introduction

In the recent years, noise generated by the operation of quantum computers will destroy the entanglement of quantum states, which is a problem that needs to be solved urgently. Quantum error correction [1, 2] is an effective means to protect the quantum information [3] from loss. To reduce the impact of quantum decoherence, error-correcting codes with topological properties are widely used. They can achieve good local stability, which provides feasibility for reversing or eliminating noise and errors in quantum systems.

The topological error correction codes are in the form of stabilizers [4]. Encoding the logical qubits in error-prone physical qubits facilitates us to find valid codes for error messages. Because the topology code information is stored in global degrees of freedom, a larger grid can provide a larger code distance. The threshold is an important representative of the performance of the physical qubits in line gate transmission. When the physical error rate is lower than a certain threshold, the quantum computers can suppress the logical error rate to a lower level by applying the quantum error correction schemes [5]. When the physical error rate of the logical qubit is better than that of the physical qubit, it is called the pseudo-threshold. We encode the physical qubits with different code distances into logical qubits and obtain the decoding threshold by suppressing the logical error rates [6]. We need to infinitely approach the pseudo-threshold to the decoding threshold to achieve the best fault-tolerant effect. When the error rate is low enough, increasing the code distance will increase the probability of success in error correction. However, when the error rate is high, increasing the code distance will reduce the probability of successful error correction. Under the depolarization noise model, [7] the topological error correction code we want to choose is a surface code with generalized constraints [8, 9] (commutability and periodicity), which is convenient for further research on the properties of topological error correction codes. Since the surface code is not self-correcting, it will have defects when it is affected by the noise. Therefore, we must actively diagnose and correct errors, and the decoding algorithm plays an important role in it. The syndrome in the surface code can check a group of physical qubits in the form of parity checks to detect errors that have occurred. An algorithm that provides a set of recovery operations for correcting errors for a given syndrome is called a decoder and it provides suggestions for correcting any errors that may occur during the calculation [10]. Since the errors detected by the syndrome are not unique, the decoder needs to combine the error statistics detected by any syndrome. The decoder generates a correction chain for this information to eliminate the syndromes in the least likely situations to cause new errors.

To solve the problem of spatial correlation [11, 12] of the surface codes, we adopt a confrontation network structure of the reinforcement learning mechanism, which is an improvement of the neural network in the deep Q network (DQN) [13], which can help us better to find the best correction chain. The logic error rate is reduced by increasing the number of flipped bits of the correction chain [14], which solves the problem that there is a large gap between the pseudo-threshold and the decoder threshold and cannot be better fault-tolerant. Therefore, the confrontation network structure of the reinforcement learning mechanism can be used to solve the problem of low-pseudo-thresholds of the surface code [15].

The sections of this paper are organized in the following manner. First, Section 2 introduces the concept and properties of surface codes. Then, Section 3 explains the decoding steps of the dueling network and the double-Q algorithm. Next, Section 4 outlines the training process. The decoding performance results of different decoders and the analysis of threshold and error rates are shown in Section 5. Finally, conclusions are drawn as discussed in Section 6.

2. Background

The surface code is a planar variant of the Kitaev code, [16] which requires fewer quantum bits to achieve the same effect of the error correction strength. The model of the surface code is the quantum double model . It can be regarded as a quantum double model D [17] that defines a general finite group on a general two-dimensional lattice. This paper chooses to use a surface code with a square lattice representation and chooses the group G as the abelian group [18]. The codespace can then be defined by the parity operator [19] acting on the four nearest quantum bits on the square lattice [20].

The surface code is chosen as a square lattice with the shape and the square contains quantum bits. The quantum bits are placed at each lattice vertex, and the four points adjacent to the middle part are a lattice, and each logical quantum bit is encoded by these data quantum bits. After spinning the lattice by 1/2, it is possible to form logical quantum bits by using a minimum number of physical quantum bits [21]. Some of the quantum bits are still located in the lattice before rotation, and the new quantum bits are defined as the auxiliary quantum bits. For the selected geometry, the surface code encodes logical quantum bits as physical quantum bits and ancilla qubits, and each stabilizer measurement [15] cycle can identify and correct up to errors. We mark the edge as , and we can represent any edge set as the formal sum , where exists on the edge of the qubit or lattice. When is not in the edge set, is 0, and when is in the edge set, is 1. We define a qubit on each edge, therefore each element of the calculation basis can be defined as follows:

The form sum of any set of edges is and all are in the edge abelian group .

The data quantum bits are placed on the vertices of the lattice, as shown in Figure 1(a). The four internally adjacent vertices form two types of lattice operators, X and Z, while the boundary is formed by two vertices, as shown in Figure 1(b). In a lattice of surface codes with periodic boundaries, the four data quantum bits surrounding the Z auxiliary quantum bits form a Z stabilizer, and performing the four Z operations does not change the parity and therefore does not lead to a change in the corrector, and the X-stabilizer remains the same as shown in Figure 1(c). The blue circle corresponds to the stabilizer of the Z operator, and the yellow circle indicates the stabilizer of the X operator. The stabilizer performs a series of CNOT gates in the same manner as shown in Figure 1(d), to entangle each auxiliary quantum bit with its four neighboring data quantum bits. Depending on the representation chosen, the lattice stabilizer [22, 23] can be defined as follows:

Since all the stabilizers commute with each other, the and form the stabilizer operator in the stabilizer code. All these operators are Hermitian with eigenvalues of on different lattices of the matrix. The commutative Hamiltonian is constructed under the topological stabilizer code.

For the surface codes under the depolarization noise model generates equal probability of X, Y, and Z errors that lead to a bit-flip X or phase-flip Z of quantum bits, [24] so the ancillary qubits are placed in the middle of each lattice. Measuring the ancilla qubits only yields +1 or −1 eigenvalues, which can correspond to the parity of four (or two in the boundary lattice) adjacent data quantum bits of the internal lattice. The set of ancilla measurements is called a syndrome [25, 26]. The introduction of ancilla qubits ensures that no information loss occurs in the measurement of neighboring data quantum bits while performing the syndrome submeasurement. The intersection of the syndromes can help identify the most likely error operators. The task of the decoder is to find errors on the data qubits from the error correction subset. Due to the strong spatial correlation between the vertex operator and the lattice operator, it is difficult to use the statistical mapping method to perform an optimal correction chain. However, dueling networks [27] under reinforcement learning can solve this problem well.

3. Models and Algorithms

Reinforcement learning is a learning mechanism that seeks to solve the problem that the agent can make the corresponding action in a certain environmental state to get maximum reward [28]. The confrontation network is a network structure of the reinforcement learning, which is an improvement in the structure of the neural network in the deep Q network (DQN), which can make the system the fastest and the best close to the real maximum return.

3.1. Finding the Optimal Error Correction Chain

Our syndrome can be regarded as the next action that the agent will perform in the state space environment. The search for the optimal error correction chain means that a trivial ring correction chain or a nontrivial ring [29] correction chain is finally generated after a series of bit reversal or phase reversal operations (as action behaviors). The generation of a nontrivial ring means that the system gets the smallest cumulative reward corresponding to the characteristic value of output −1, whereas the generation of the trivial ring opposite the system gets the largest reward corresponding to the characteristic value of output +1 as shown in Figure 2.

An action corresponding to the agent changes the current environment state , accumulates rewards for this process , and accumulates maximum return after a series of actions ; among them, is the discount rate, [30, 31] affected by noise and other factors, and the attenuation of the reward corresponds to the increase of Pauli operators such as X and Y during the correction process [32]. The specific operation process of the next duel network is as follows: we define bit flip or phase flip as an action-value function.where is the action-value function, which refers to the optimal correction chain that we expect to get after we know the error type and syndrome layout, and is the strategy function, according to the current error state, the corrections give further correction instructions. is the return, is the state space set, and is the action set. The optimal action value is defined as follows:

To eliminate the existence of different syndromes resulting in the return of after the strategy selection (correction chain) is lower than the optimal value function. it eliminates the strategy function to ensure that the optimal correction chain can be obtained. The state value function is the expectation of about :

The definition of the optimal state value function is

The definition of the optimal advantage function is

The duel network consists of two neural networks, one is denoted as , which is an approximation to the optimal advantage function . The other is denoted by the neural network , and it is an approximation of the optimal action-value function . Thus, we derive the optimal action-value function [19, 33].

Among them, on the left side of the formula is the duel network we want, which is the optimal action-value function . Approximately, its parameters are recorded as . The optimal action value is the optimal error we want to find in the correction chain as shown in Figure 2, the state space (syndrome) [34] is the input to the convolutional neural network and the feature vector is the output after the convolution operation, and the fully connected network is further used for multilayer connection to output multiple error chains [35] that we want to correct and finally output in the form of a feature vector [36, 37] to find the optimal error correction chain we need.

3.2. Double-Q Learning Algorithm

We use a duel network in the double-Q learning algorithm [38] to increase the number of flipped bits of the error correction chain and to reduce the error rate threshold [39]. The double-Q learning abbreviated as DDQN is the optimization of the Q learning algorithm; it can solve the bootstrap bias [40] that Q learning is difficult to solve, alleviate the overestimation problem caused by maximization, and better solve the problem of our optimization threshold which is too high. We use the fully connected network structure in the confrontation network to encode the syndrome of the surface code and use the DDQN algorithm to optimize the value of the action (bit-flip and phase-flip) and use the convolutional neural network decoder [41] to decode the output eigenvalues to gradually find the correction chain that we want to restore.

The DDQN algorithm will randomly take out a state syndrome from our system space each time, as represented by a quadruple , in the decoding process we need to forward the syndrome to get

Among them, is the DQN parameter, select action (corresponding bit flip operation)

After performing a flip, the new syndrome is

Then, do backpropagation to DQN to get the gradient . Update and optimize the number of flips for gradient descent

In DDQN, we set the rate of return and update the parameters of the CNN network by weighted average to get the best syndrome

The abovementioned algorithm flow can be summarized as shown in Figure 3 and Algorithm 1.

(1)While the original syndrome defect still exists do
(2) The syndrome is temporarily stored in the buffer pool
(3) Randomly select the samples from the buffer pool:
(4) Calculate using the dual-Q network for all perspectives
(5) Choose which defect to move with the action using the experience reapplication technology
(6) Use the neural network in the duel network to find the optimal weight: to equation (14).
(7) The feature vector into the fully connected network to get the target network:
(8) SGD gets the optimal dual-Q network after normalization:
(9) Get the new syndrome and store it in the quadruple:
(10)for each transition tuple in the sample do
(11)  Construct targets using the target network and reward to equation (16).
(12)end for
(13) Update dual-Q network parameters
(14) Every n iterations, synchronize the target network with the network setting of
(15)end while

4. Training

The syndrome (agent) can be chosen to add to the lattice of defects in any direction (left, right, up, or down), which corresponds to flip one of the physical qubits on the lattice containing the defect. Figure 3 is divided into the action phase and the learning phase. The training starts from the action phase, [42] in which the state of the syndrome is sent to the agent (a two-dimensional matrix is formed). The syndrome uses a double-Q network to propose the defect (error position) state and action . The agent uses a greedy strategy, which means that it will suggest the operation with the highest Q value with a probability of . Otherwise, random measures are recommended. The action is to cause the reward and the new observation value derived from the obtained syndrome at the defect corresponding to . The network training of this reinforcement learning utilizes the DDQN algorithm and reimplements the technology based on experience to store the new correction chain obtained by the flipping action after the syndrome row is completed in the form of a binary array in the buffer. When we update the parameter of the confrontation network, we randomly sample a small batch of samples from the buffer to randomly obtain our experience samples. In order to correct the correlation of the training data with the gradient descent and to minimize their correlation, the training accuracy of our neural network can be made more accurate. To improve the training speed of the model, and to ensure the efficiency of the model training under a larger dataset, the ResNet network [43] layer is introduced as the underlying architecture, which is known as the “shortcut,” and it can ensure that a large number of stacks can be stacked without reducing the learning efficiency convolutional layer by adding the shortcut output to the output of the stacked-layer through the residual block and by using the ResNet7, 14, and 21 network layers for data training. In order to ensure the integrity of the data, we also need to periodically synchronize the parameters of the fully connected network in the confrontation network with the parameters of the CNN network.

The training starts with the input of the syndrome and by randomly selecting a reversal action with a probability of . It records the action and the defect (error center). The network uses a greedy algorithm to pregenerate the probability of the value action with the highest return . This action will generate a reward , a new state environment , a new defect center , and a new syndrome. The whole process of DDQN algorithm training is to input the quadruple in binary form into the buffer for temporary storage and then perform gradient descent to minimize the correlation of the data. Learn , where is the size of a given batch. Here is the optimal value of the network training targetwhere is the reward factor, in which the target network is changed by the parameter to predict the future cumulative reward (different corrected trivial chain and nontrivial chain). Afterwards, the gradient descent method is used to minimize the difference between the sample value and the double-Q network value, according to , normalization improves the training parameters of the network. Then, train a new sequence and synchronize the weight of the target network with the dual-Q network at a specified rate. Finally, use the ResNet network architecture in the CNN network and perform multiple iterations and predictions [7] and simultaneously synchronize the parameters of the regular fully connected network and the CNN network. The training process is shown in Figure 4, in order to achieve our ideal accuracy of more than , because the network model of ResNet21 can achieve the accuracy we want under lower parameters, but for the 7-layer and 14-layer models, we have to increase the training parameters, and finally, after ensuring that the iteration depth reaches 500 times, the accuracy of the three can reach more than , which ensures that the generation of the corrected trivial chain with a code distance of 5 can be accurately predicted under the noise intensity of 0.042.

5. Simulation Analysis

We use the DDQN algorithm in the confrontation network to generate our prediction model. The efficiency and accuracy achieved by different network layers are also different. The dataset is shown in Table 1. We readded the MWPM comparison again. It can be seen that the number of training steps and parameters have been reduced by 2 data level and 1 data level, respectively, in Table 1.

As can be seen in Figure 5(a), the logic error rate under the MWPM decoder with a smaller code distance shows a slow rise, where the pseudo-threshold of d = 7 is closest to the decoder threshold of 0.0050 obtained in the gray region as shown in the small figure, at which the physical quantity quantum bits can be well tolerated under noise interferences, but the overhead is too large and the threshold is too low leading to huge cost/expenditures. After using CNN decoder, the pseudo-threshold of the same code distance d = 7 as shown in Figure 5(b) is closer to the decoder threshold of 0.0055 in the gray area, although the pseudo-threshold is closer to the decoder threshold in dimension, but the threshold strength is still not high enough.

In order to further improve the decoder threshold, we expand the code distance as shown in Figure 6, and it is obvious that the logic error rate rises fast under the expanded code distance, and the pseudo-threshold of d = 9 as shown in Figure 6(a) has far exceeded the pseudo-threshold of d = 3 and is closer to the decoder threshold and is increased to 0.0055; in addition, the decoder threshold as shown in Figure 6(b) has also increased to 0.0085. It is close to the perfect threshold effect, and the pseudo-threshold of d = 13 has achieved the idealized approach, which largely solves the problem of the low threshold of the surface code and the errors leading to the poor error correction ability. It makes a reliable guarantee for the integrity of the quantum information transmission and improves the quantum information fidelity to the level of 0.7243.

After the decoder threshold has been significantly raised, this paper verifies the integrity of the quantum information transmission at the line guys, as shown in Figures 7 and 8, and the effect of quantum information fidelity produced under different decoders. As can be seen in Figure 7, the fidelity of the quantum bit information transmission under the MWPM decoder is somewhat lower than that of the CNN decoder, and in the case of the MWPM decoder, the threshold bit is 0.0055, but the fidelity only reaches 0.435, which fails to guarantee information transmission well. We placed the boosted decoder threshold into the new quantum bit transmission as shown in Figure 8 and it can be seen that the quantum information fidelity has been boosted to a high-performance fidelity of 0.754 under the CNN decoder, thus upgrading the quantum information to a new stage in the information transmission process against noise interference and achieving a breakthrough of raising the fidelity information to double.

6. Conclusion

In this paper, the surface code is selected as the quantum error correction code under the ideal error, and it has produced a good effect in the quantum error correction. To avoid the problem of excessively high prediction values, we have introduced a double-Q algorithm, which improves the decoder threshold to about 0.0055 compared to the MWPM’s 0.005 error correction. Using the 14 and 21 layers of the ResNet network shortens the training time by 30%, while at the same time increases the training accuracy to about 96.6%. Using the dual-layer convolutional neural network model in the duel network, we succeeded in increasing the decoder threshold from 0.0068 to 0.0085, which has greatly improved the error correction efficiency of the whole model. However, from the simulation results in this paper, we can see that the factors affecting the transmission of quantum information are not only external noise interference and code spacing but also other external factors such as the possibility of stable measurement errors which can affect the quantum bits and can lead to errors. Not only the optimization of the decoder algorithm needs to get closer to the optimal threshold but also the development of precision instrumentation for quantum computers is essential. Therefore, we need to further investigate the quantum error correction mechanism to obtain better quantum information transmission results.

Data Availability

Data are available on request.

Conflicts of Interest

The authors declare no potential conflicts of interest.

Acknowledgments

This study was supported by the National Natural Science Foundation of China (Grant No. 61772295), Natural Science Foundation of Shandong Province, China (Grant Nos. ZR2021MF049 and ZR2019YQ01), and Project of Shandong Provincial Natural Science Foundation Joint Fund Application (ZR202108020011).