Mathematical Problems in Engineering

Research Article

Multi-USV System Cooperative Underwater Target Search Based on Reinforcement Learning and Probability Map

Training process of the DDQN algorithm.

(1)	Initialize Q-network for the USV, replay buffer
(2)	fordo
(3)	Initialize the environment, state and time
(4)	while not ( or targets are found) do
(5)	for do
(6)	Receive observation
(7)	Select action according to
(8)	Execute action , receive reward and reach state
(9)	Get observation
(10)	Store transition in if is trained
(11)	end
(12)	Sample random minibatch of transitions from
(13)	Perform a gradient descent step on
(14)	Update time