Research Article

Robot Obstacle Avoidance Controller Based on Deep Reinforcement Learning

Algorithm 1

DQN.
Input: Pixels and reward
Output: Q action-value function
Initialization
Initialize replay memory space
Initialize the Q network (action-value function) with random weights
Initialize target network (action-value function) with weights
1: Fordo
2:  Initialize sequence and preprocessed sequence
3:  Fordo
4:   Following policy, select
5:   Run action in an emulator and observe the reward and image
6:   Set and preprocess
7:   Store transition in
8:   Sample random minibatch of transitions from
9:   Set
10:   Calculate the loss (Perform a gradient descent step on)
    
11:   Train and update weights of
12:  End
13: End