Research Article

Joint Optimization of Jamming Link and Power Control in Communication Countermeasures: A Multiagent Deep Reinforcement Learning Approach

Algorithm 1

Decentralized allocation of communication jamming resource based on MASAC.
• Initialize the CRB.
• Initialize policy network , twin evaluation network and for each JE i with weights , respectively.
• Initialize twin target network and for each JE with weights , respectively.
• Training episode =1.
• While Training episode ≤ Edo
•  Initialize the environment state S(t) = (0, 0, …, 0).
•   for time step t =1, 2, …, Tmax
•    Each JE i selects the jamming action according to the current observation .
•     Obtain and carry out the joint jamming action at, , then each JE i obtains the shared reward rt and achieves the next observations .
•     The experience from all JEs is stored in CRB:
•     If the capacity of CRB is larger than β, then the training process begins:
•      Stochastically Sampling mini-batch of experiences from CRB.
•      for each JE i =1, 2, …, N
•        Update the weight and of twin evaluation network with (23)
•        Update the weight of the policy network by (24) and (25)
•        Soft update the weight and of twin target network through (26)
•      end for
•   end If
•  end for
• end while