Research Article

Quantum Information Protection Scheme Based on Reinforcement Learning for Periodic Surface Codes

Algorithm 1

Training the reinforcement learning agent decoder.
(1)While the original syndrome defect still exists do
(2) The syndrome is temporarily stored in the buffer pool
(3) Randomly select the samples from the buffer pool:
(4) Calculate using the dual-Q network for all perspectives
(5) Choose which defect to move with the action using the experience reapplication technology
(6) Use the neural network in the duel network to find the optimal weight: to equation (14).
(7) The feature vector into the fully connected network to get the target network:
(8) SGD gets the optimal dual-Q network after normalization:
(9) Get the new syndrome and store it in the quadruple:
(10)for each transition tuple in the sample do
(11)  Construct targets using the target network and reward to equation (16).
(12)end for
(13) Update dual-Q network parameters
(14) Every n iterations, synchronize the target network with the network setting of
(15)end while