Security and Communication Networks

Research Article

An Intelligent Offloading System Based on Multiagent Reinforcement Learning

Multiagent recurrent deep deterministic policy algorithm (MARDDPG).

	Input: the environment
Output:
(1)	Initialize the parameters for the N actor network and for the critic network
(2)	Initialized the replay memory M
(3)	For training step = 1:all steps, do
(4)	=initialized message, t = 0
(5)	While t< T, do
(6)	For i = 1:N, do
(7)	Select the action for agent i_t
(8)	Receive reward which is calculated by regional average preference function
(9)	Receive observation
(10)	Update message by
(11)	End for
(12)
(13)
(14)	End while
(15)	Store episode {m₀, a₁, r₁, m₁, o₂, a₂,...} in M
(16)	Sample a random minibatch of episodes from replay memory M
(17)	Each episode and each time, we do
(18)	Update the critic, actor, and LSTM network by minimizing the loss
(19)	Soft-update the target critic and target actor network
(20)	End for