Research Article
Multi-USV System Cooperative Underwater Target Search Based on Reinforcement Learning and Probability Map
Algorithm 1
Training process of the DDQN algorithm.
| (1) | Initialize Q-network for the USV, replay buffer | | (2) | fordo | | (3) | Initialize the environment, state and time | | (4) | while not ( or targets are found) do | | (5) | for do | | (6) | Receive observation | | (7) | Select action according to | | (8) | Execute action , receive reward and reach state | | (9) | Get observation | | (10) | Store transition in if is trained | | (11) | end | | (12) | Sample random minibatch of transitions from | | (13) | Perform a gradient descent step on | | (14) | Update time |
|