Research Article

Deep Reinforcement Learning-Based Joint Satellite Scheduling and Resource Allocation in Satellite-Terrestrial Integrated Networks

Algorithm 1

PSDDQN for joint satellite association and channel allocation.
Input: number of episodes, Num_Episodes; number of time slices, Num_Timeslices; number of leaves of Sumtree structure, B; exploration rate, ε; update frequency, F; learning rate, α ; number of satellites, N;
Output: the weight of online Q-Network, θ;
1. Initialize the state of STINs, including the capacity C of BSs, antenna model, channel model, satellite orbit parameters, and the initial positions of satellites;
2. Randomly initialize the weight θ of online Q-Network; for the weight in target Q-Network, θ'=θ;
3.For episode =1 to Num_Episodes do
4. For time =1 to Num_Timeslices do
5.  For i =1 to N do
6.   Get the state information of satellite i from the ground control centre at time t, si;
7.  End for
8.  Get the sate information of BSs from the ground control centre at time t, H;
9.  Obtain the state information of STINs at time t, S=(s1,s2,s3,…,sN , H);
10.  Get the next state of STINs S’ from the ground control centre at t+1 and its termination flag;
11.  For i =1 to N do
12.   Use ε-greedy strategy to select an action, ai;
13.   The agent execute action ai and obtain the instant reward ri by Equation (20);
14.   Store state transition information (si, ai, ri, si ) in the Sumtree structure;
15.  End for
16.  S= S’;
17.  The BSs and satellites send their state information to the ground control centre for updating the state information of STINs;
18.  Sample samples from the Sumtree structure, and compute the loss of Q-value of each sample according to Equation (29);
19.  Compute the gradient of each sample according to Equation (30);
20.  Update the weight of online Q-Network according to the back propagation algorithm;
21.  Compute the TD-error value of each sample according Equation (32) and update its priority by Equation (33);
22.  Update the parameters of target Q-Network every frequency F, let θ'=θ;
23. End for
24.End for