Research Article

Quantum Information Protection Scheme Based on Reinforcement Learning for Periodic Surface Codes

Figure 3

From the initial action selection stage, through the buffer pool, samples are randomly selected for reinforcement learning, and SGD is used to minimize the mean square error of the true value and the predicted value. Then, we use the double-Q algorithm to perform multiple iterations on the syndrome, and finally, we get the error correction chain that is best close to what we want.