Research Article
Service Migration Policy Optimization considering User Mobility for E-Healthcare Applications
Algorithm 4
SMVI (max_iter = 100, max_step = 100, tol = 1e-6).
| (1) | env = Env ()//Environment Initialization: MDP mode: state s, action a, and reward r | | (2) | Initialized V | | (3) | for (i = 1, i ++ , i < max_iter) | | (4) | new_V = V.copy () | | (5) | update_steps = 0 | | (6) | for all s in S | | (7) | Initialize value function V | | (8) | for a in range (N_ACTIONS): | | (9) | n_s = env.P [s, a] | | (10) | r = env.R [s, a] | | (11) | qs [a] = r + gamma V [n_s [0], n_s [1]] | | (12) | update_steps + = 1 | | (13) | new_V [i, j] = np.max (qs)# update value function base on Bellman’s equation | | (14) | mean_values.append (np.mean (V))#store the mean value | | (15) | run_times.append (time.time ()-st)#strore the run time | | (16) | if np.sum (np.abs (V-new_V)) < tol: | | (17) | break | | (18) | V = new_V | | (19) | return V, mean_values, run_times |
|