Research Article
Path Planning of Unmanned Helicopter in Complex Environment Based on Heuristic Deep Q-Network
Algorithm: DQN algorithm | Initialization: initialize training network parameter and target network parameter , . | Iterative process: | Repeat (for each episode) | Initialization state | Repeat (for each step) | Select action based on the policy | Perform action to get reward and next state | Store transition in the experience memory | Sample random mini batch from the experience memory | | Loss function is obtained | Updating network parameters | | End Repeat (is the terminated state) | End Repeat (End of the training) |
|