Research Article

Deep Reinforcement Learning-Based Content Placement and Trajectory Design in Urban Cache-Enabled UAV Networks

Algorithm 1

Deep reinforcement learning-joint caching and trajectory design (DRL-JCT).
1: Initialize content placement .
2: Randomly initialize value function with weight .
3: Initialize target value function with weight .
4: Initialize replay memory to size , replay buffer size to .
5: for episode do
6: Initialize environment and state to s1.
7: while available do
8:  ifthen
9:   choose action .
10:  else
11:   randomly choose an action.
12:  end if
13:  Execute and observe .
14:  store transition (st, at, rt, st+1) in .
15:  sample random minibatch (sj, aj, rj, sj+1) with size from .
16:  Calculate target value: .
17:  Loss function
18:  update using by gradient decent.
19:  Every steps reset .
20: end while
21: end for