Wireless Communications and Mobile Computing

Research Article

Deep Reinforcement Learning-Based Content Placement and Trajectory Design in Urban Cache-Enabled UAV Networks

Algorithm 1

Deep reinforcement learning-joint caching and trajectory design (DRL-JCT).

1: Initialize content placement .
2: Randomly initialize value function with weight .
3: Initialize target value function with weight .
4: Initialize replay memory to size , replay buffer size to .
5: for episode do
6: Initialize environment and state to s₁.
7: while available do
8: ifthen
9: choose action .
10: else
11: randomly choose an action.
12: end if
13: Execute and observe .
14: store transition (s_t, a_t, r_t, s_t+1) in .
15: sample random minibatch (s_j, a_j, r_j, s_j+1) with size from .
16: Calculate target value: .
17: Loss function
18: update using by gradient decent.
19: Every steps reset .
20: end while
21: end for