Journal of Sensors

Research Article

Robot Obstacle Avoidance Controller Based on Deep Reinforcement Learning

DQN.

Input: Pixels and reward
Output: Q action-value function
Initialization
Initialize replay memory space
Initialize the Q network (action-value function) with random weights
Initialize target network (action-value function) with weights
1: Fordo
2: Initialize sequence and preprocessed sequence
3: Fordo
4: Following policy, select

5: Run action in an emulator and observe the reward and image
6: Set and preprocess
7: Store transition in
8: Sample random minibatch of transitions from
9: Set
10: Calculate the loss (Perform a gradient descent step on)

11: Train and update weights of
12: End
13: End