Research Article

Multi-USV System Cooperative Underwater Target Search Based on Reinforcement Learning and Probability Map

Algorithm 1

Training process of the DDQN algorithm.
(1)Initialize Q-network for the USV, replay buffer
(2)fordo
(3)Initialize the environment, state and time
(4)while not ( or targets are found) do
(5)for do
(6)Receive observation
(7)Select action according to
(8)Execute action , receive reward and reach state
(9)Get observation
(10)Store transition in if is trained
(11)end
(12)Sample random minibatch of transitions from
(13)Perform a gradient descent step on
(14)Update time