Research Article
Quantum Information Protection Scheme Based on Reinforcement Learning for Periodic Surface Codes
Algorithm 1
Training the reinforcement learning agent decoder.
(1) | While the original syndrome defect still exists do | (2) | The syndrome is temporarily stored in the buffer pool | (3) | Randomly select the samples from the buffer pool: | (4) | Calculate using the dual-Q network for all perspectives | (5) | Choose which defect to move with the action using the experience reapplication technology | (6) | Use the neural network in the duel network to find the optimal weight: to equation (14). | (7) | The feature vector into the fully connected network to get the target network: | (8) | SGD gets the optimal dual-Q network after normalization: | (9) | Get the new syndrome and store it in the quadruple: | (10) | for each transition tuple in the sample do | (11) | Construct targets using the target network and reward to equation (16). | (12) | end for | (13) | Update dual-Q network parameters | (14) | Every n iterations, synchronize the target network with the network setting of | (15) | end while |
|