Wireless Communications and Mobile Computing

Research Article

Deep Reinforcement Learning-Based Collaborative Video Caching and Transcoding in Clustered and Intelligent Edge B5G Networks

Deep reinforcement learning algorithm for collaborative video caching and transcoding (DRL-CCT).

1: Initialization:
2: Initialize replay memory D to capacity N
3: Initialize Q network and target Q network with random weights
4: Initialize MEC service matrix V of requests
5: for episode =1, Mdo
6: Generate the user requests data
7: Observe the initial state s₁ as illustrated in Eq. (11)
8: for t =1, Tdo
9: Give a random probability
10: Choose action A(t) which listed in Eq. (12) as
11: Based the action A(t), execute the transcoding policy and the caching updated
12: Observe the reward r(t), state s(t+1)
13: Store the transition (s(t), A(t), r(t), s(t+1)) in D
14: Update MEC service matrix V of requests
15: Sample random minibatch of transitions
(s(t), A(t), r(t), s(t+1)) from D
16: Set
17: Perform a gradient descent step according to equation: .
18: Update the parameters in the Q network
19: Reset the parameters in the target Q network every G time stages
20: end for
21: end for