Research Article
Deep Reinforcement Learning-Based Collaborative Video Caching and Transcoding in Clustered and Intelligent Edge B5G Networks
Algorithm 1
Deep reinforcement learning algorithm for collaborative video caching and transcoding (DRL-CCT).
1: Initialization: | 2: Initialize replay memory D to capacity N | 3: Initialize Q network and target Q network with random weights | 4: Initialize MEC service matrix V of requests | 5: for episode =1, Mdo | 6: Generate the user requests data | 7: Observe the initial state s1 as illustrated in Eq. (11) | 8: for t =1, Tdo | 9: Give a random probability | 10: Choose action A(t) which listed in Eq. (12) as | 11: Based the action A(t), execute the transcoding policy and the caching updated | 12: Observe the reward r(t), state s(t+1) | 13: Store the transition (s(t), A(t), r(t), s(t+1)) in D | 14: Update MEC service matrix V of requests | 15: Sample random minibatch of transitions | (s(t), A(t), r(t), s(t+1)) from D | 16: Set | 17: Perform a gradient descent step according to equation: . | 18: Update the parameters in the Q network | 19: Reset the parameters in the target Q network every G time stages | 20: end for | 21: end for |
|