Research Article

An Exoatmospheric Homing Guidance Law Based on Deep Q Network

Algorithm 1

Prioritized experience replay DQN for homing guidance.
1. Initialize Q network parameter w, target Q network parameter w = w'.
2. Initialize replay memory D with capacity N, the priority of all Sum Tree leaf nodes pj=1.
3. For i=1to T do
4. Initialize s as the first state in the current state sequences of interceptor.
5. While s is not Termination:
6.  a) Select an action a with ε-greedy.
7.  b) Execute action a, transfer to the next state s', and get the immediate reward r. Judge whether it is in the termination state d.
8.  c) Store transition {s, a, s', r, d} in D. Replace the oldest tuple if ‖D‖>N.
9.  d) Sample n tuples from D, { sj, aj,s'j, rj, dj }, j=1,2,3,…,n. The sampling probability is . Compute the weight of loss function: .
10.  e) Compute the current target Q value yi.
    .
11.  f) Compute the loss as equation (2). Updating Q network parameter w.
12.  g) Compute TD error of all sample data: . Update the priority of all Sum Tree nodes: .
13.  h) if T%C == 0, Update the target Q network parameter w'=w End if.
14.  i) s=s'.
15. End For