International Journal of Aerospace Engineering

Research Article

An Exoatmospheric Homing Guidance Law Based on Deep Q Network

Prioritized experience replay DQN for homing guidance.

1. Initialize Q network parameter w, target Q network parameter w = w'.
2. Initialize replay memory D with capacity N, the priority of all Sum Tree leaf nodes p_j=1.
3. For i=1to T do
4. Initialize s as the first state in the current state sequences of interceptor.
5. While s is not Termination:
6. a) Select an action a with ε-greedy.
7. b) Execute action a, transfer to the next state s', and get the immediate reward r. Judge whether it is in the termination state d.
8. c) Store transition {s, a, s', r, d} in D. Replace the oldest tuple if ‖D‖>N.
9. d) Sample n tuples from D, { s_j, a_j,s'_j, r_j, d_j }, j=1,2,3,…,n. The sampling probability is . Compute the weight of loss function: .
10. e) Compute the current target Q value y_i.
.
11. f) Compute the loss as equation (2). Updating Q network parameter w.
12. g) Compute TD error of all sample data: . Update the priority of all Sum Tree nodes: .
13. h) if T%C == 0, Update the target Q network parameter w'=w End if.
14. i) s=s'.
15. End For