Research Article
Multi-USV System Cooperative Underwater Target Search Based on Reinforcement Learning and Probability Map
Algorithm 1
Training process of the DDQN algorithm.
(1) | Initialize Q-network for the USV, replay buffer | (2) | fordo | (3) | Initialize the environment, state and time | (4) | while not ( or targets are found) do | (5) | for do | (6) | Receive observation | (7) | Select action according to | (8) | Execute action , receive reward and reach state | (9) | Get observation | (10) | Store transition in if is trained | (11) | end | (12) | Sample random minibatch of transitions from | (13) | Perform a gradient descent step on | (14) | Update time |
|