Research Article
Joint Optimization for MEC Computation Offloading and Resource Allocation in IoV Based on Deep Reinforcement Learning
Algorithm 1
Decentralized multiagent DDPG optimization method.
| | Randomly initialize critic network and actor with weights and | | | Initialize target network and with weights , | | | Initialize replay buffer | | | for episode | | | Initialize a random process foe action exploration | | | Receive initial observation state | | | | | | for | | | Select action according to the current policy and exploration noise | | | Execute action and observe reward and observe the next state | | | Store all transitions in | | | Sample a random mini-batch of transitions from | | | Set | | | | | | Update critic network by minimizing the loss | | | | | | Update the actor policy by using the sampled policy gradient | | | | | | Update the target networks for each agent : | | | | | | | | | end for | | | end for | | | end for |
|