Research Article
Service Migration Policy Optimization considering User Mobility for E-Healthcare Applications
Algorithm 2
Value_evaluate (policy, env, max_step = 100, tol = 1e-6).
| (1) | initialization V | | (2) | for i in range (max_step): | | (3) | new_V = V.copy () | | (4) | for all s in S: # for every state, update the value fuction | | (5) | qs = np.zeros ((N_ACTIONS), dtype = np.float32)//Initialize value fuction | | (6) | for a in range (N_ACTIONS): //store the Q value for each action | | (7) | n_s = env.P [s, a] | | (8) | r = env.R [s, a] | | (9) | n_V = V [n_s [0], n_s [1]] | | (10) | qs [a] = r + gamma n_V | | (11) | new_V [s] = np.sum (qs policy [i, j]) | | (12) | End for | | (13) | End for | | (14) | if np.sum (np.abs (V-new_V)) < tol: | | (15) | break | | (16) | V = new_V | | (17) | End for | | (18) | return V |
|