Research Article
Service Migration Policy Optimization considering User Mobility for E-Healthcare Applications
Algorithm 3
Policy_improvement (env, V).
| (1) | Initialize policy | | (2) | for all s in S: | | (3) | Initialize qs | | (4) | for a in range (N_ACTIONS): | | (5) | n_s = env.P [s, a] | | (6) | r = env.R [s, a] | | (7) | qs [a] = r + gamma V [n_s [0], n_s [1]] | | (8) | p = (np.abs (qs − np.max (qs)) < 1e-6) # greedy strategy | | (9) | p = np.array (p, dtype = np.float32)/np.sum (p) #convert to float type and normalization | | (10) | policy [i, j] = p | | (11) | return policy |
|