Research Article
Edge Caching for D2D Enabled Hierarchical Wireless Networks with Deep Reinforcement Learning
Algorithm 2
Double DQN-based content caching algorithm.
| Initialization: Experience replay memory , main network with random weights , target | | network with , and the period of replacing target Q network . | | Iteration: | | 1: for each episode | | 2: Initialize | | 3 i | | 4: for each step of episode | | 5: | | 6: Randomly generate | | 8: if | | 9: randomly select an action | | 10: else | | 11: | | 12: Take action | | 13: Obtain and . | | 14: Store into . | | 15: Randomly sample a mini-batch of transitions . | | 16: Update with . | | 17: if i | | 18: Update | | 19: | | 20: | | 21: end for | | 22: end for |
|