Research Article
Reinforcement Learning-Based Autonomous Navigation and Obstacle Avoidance for USVs under Partially Observable Conditions
Algorithm 1
UANOA algorithm for navigation and obstacle avoidance of the USV.
| | Initialize replay memory | | Initialize evaluate function of the USV with random weights | | Initialize target function of the USV with random weights | | for episode = 1,2, …, M do | | for t = 1, …T do | | With probability select a random USV rudder action | | otherwise select | | Get reward and next state by executing rudder action | | Store experience in where is processed by the LSTM network | | Sample random minibatch of experience from | | Set | | Perform a gradient descent step on with respect to the | | weights | | Every steps reset | | end | | end |
|