Research Article

Deep Reinforcement Learning-Based Collaborative Video Caching and Transcoding in Clustered and Intelligent Edge B5G Networks

Algorithm 1

Deep reinforcement learning algorithm for collaborative video caching and transcoding (DRL-CCT).
1: Initialization:
2: Initialize replay memory D to capacity N
3: Initialize Q network and target Q network with random weights
4: Initialize MEC service matrix V of requests
5: for episode =1, Mdo
6: Generate the user requests data
7: Observe the initial state s1 as illustrated in Eq. (11)
8: for t =1, Tdo
9:  Give a random probability
10:  Choose action A(t) which listed in Eq. (12) as
11:  Based the action A(t), execute the transcoding policy and the caching updated
12:  Observe the reward r(t), state s(t+1)
13:  Store the transition (s(t), A(t), r(t), s(t+1)) in D
14:  Update MEC service matrix V of requests
15:  Sample random minibatch of transitions
    (s(t), A(t), r(t), s(t+1)) from D
16:  Set
17:  Perform a gradient descent step according to equation: .
18:  Update the parameters in the Q network
19:  Reset the parameters in the target Q network every G time stages
20: end for
21: end for