Research Article
Robot Obstacle Avoidance Controller Based on Deep Reinforcement Learning
Input: Pixels and reward | Output: Q action-value function | Initialization | Initialize replay memory space | Initialize the Q network (action-value function) with random weights | Initialize target network (action-value function) with weights | 1: Fordo | 2: Initialize sequence and preprocessed sequence | 3: Fordo | 4: Following policy, select | | 5: Run action in an emulator and observe the reward and image | 6: Set and preprocess | 7: Store transition in | 8: Sample random minibatch of transitions from | 9: Set | 10: Calculate the loss (Perform a gradient descent step on) | | 11: Train and update weights of | 12: End | 13: End |
|