Research Article
Deep Reinforcement Learning-Based Content Placement and Trajectory Design in Urban Cache-Enabled UAV Networks
Algorithm 1
Deep reinforcement learning-joint caching and trajectory design (DRL-JCT).
1: Initialize content placement . | 2: Randomly initialize value function with weight . | 3: Initialize target value function with weight . | 4: Initialize replay memory to size , replay buffer size to . | 5: for episode do | 6: Initialize environment and state to s1. | 7: while available do | 8: ifthen | 9: choose action . | 10: else | 11: randomly choose an action. | 12: end if | 13: Execute and observe . | 14: store transition (st, at, rt, st+1) in . | 15: sample random minibatch (sj, aj, rj, sj+1) with size from . | 16: Calculate target value: . | 17: Loss function | 18: update using by gradient decent. | 19: Every steps reset . | 20: end while | 21: end for |
|