Wireless Communications and Mobile Computing

Research Article

Joint Optimization of Jamming Link and Power Control in Communication Countermeasures: A Multiagent Deep Reinforcement Learning Approach

Decentralized allocation of communication jamming resource based on MASAC.

• Initialize the CRB.
• Initialize policy network , twin evaluation network and for each JE i with weights 、、, respectively.
• Initialize twin target network and for each JE with weights 、, respectively.
• Training episode =1.
• While Training episode ≤ Edo
• Initialize the environment state S(t) = (0, 0, …, 0).
• for time step t =1, 2, …, T_max
• Each JE i selects the jamming action according to the current observation .
• Obtain and carry out the joint jamming action a_t, , then each JE i obtains the shared reward r_t and achieves the next observations .
• The experience from all JEs is stored in CRB:
• If the capacity of CRB is larger than β, then the training process begins:
• Stochastically Sampling mini-batch of experiences from CRB.
• for each JE i =1, 2, …, N
• Update the weight and of twin evaluation network with (23)
• Update the weight of the policy network by (24) and (25)
• Soft update the weight and of twin target network through (26)
• end for
• end If
• end for
• end while