| 1. Initialize Q network parameter w, target Q network parameter w = w'. |
| 2. Initialize replay memory D with capacity N, the priority of all Sum Tree leaf nodes pj=1. |
| 3. For i=1to T do |
| 4. Initialize s as the first state in the current state sequences of interceptor. |
| 5. While s is not Termination: |
| 6. a) Select an action a with ε-greedy. |
| 7. b) Execute action a, transfer to the next state s', and get the immediate reward r. Judge whether it is in the termination state d. |
| 8. c) Store transition {s, a, s', r, d} in D. Replace the oldest tuple if ‖D‖>N. |
| 9. d) Sample n tuples from D, { sj, aj,s'j, rj, dj }, j=1,2,3,…,n. The sampling probability is . Compute the weight of loss function: . |
| 10. e) Compute the current target Q value yi. |
| . |
| 11. f) Compute the loss as equation (2). Updating Q network parameter w. |
| 12. g) Compute TD error of all sample data: . Update the priority of all Sum Tree nodes: . |
| 13. h) if T%C == 0, Update the target Q network parameter w'=w End if. |
| 14. i) s=s'. |
| 15. End For |