Research Article
Quantum Information Protection Scheme Based on Reinforcement Learning for Periodic Surface Codes
Algorithm 1
Training the reinforcement learning agent decoder.
| (1) | While the original syndrome defect still exists do | | (2) | The syndrome is temporarily stored in the buffer pool | | (3) | Randomly select the samples from the buffer pool: | | (4) | Calculate using the dual-Q network for all perspectives | | (5) | Choose which defect to move with the action using the experience reapplication technology | | (6) | Use the neural network in the duel network to find the optimal weight: to equation (14). | | (7) | The feature vector into the fully connected network to get the target network: | | (8) | SGD gets the optimal dual-Q network after normalization: | | (9) | Get the new syndrome and store it in the quadruple: | | (10) | for each transition tuple in the sample do | | (11) | Construct targets using the target network and reward to equation (16). | | (12) | end for | | (13) | Update dual-Q network parameters | | (14) | Every n iterations, synchronize the target network with the network setting of | | (15) | end while |
|