| Initialize and for all ; |
| Set parameters and decision time; |
| Give the initial state ; |
| Repeat |
| (1) Choose an exploration action based on the mixed strategy set ; |
| (2) Execute the exploration action to AGC units and run LFC system for the next sec; |
| (3) Observe a new state via CPS1 and ACE; |
| (4) Obtain a short-term reward using Eq. (19); |
| (5) Update eligibility trace according to Eq. (2); |
| (6) Update Q function using Eq. (3); |
| (7) Select variable learning rate δ with Eq. (7); |
| (8) Compute by Eq. (5) and Eq. (6); |
| (9) Calculate and according to Eq. (8); |
| (10) Update the mixed strategy according to Eq. (4); |
| (11) Obtain the total power of the GSGi; |
| (12) Determine the ramp rate according to Eq. (13); |
| (13) Execute CC algorithm according to Eq. (14) and Eq. (15); |
| (14) Calculate the uth unit power in GSGi; |
| (15) If the power limit is not exceeded, then execute step 17; |
| (16) Calculate and according to Eq. (17). And update using Eq. (9), Eq. (11) and Eq. (18); |
| (17) Calculate the power error according to Eq. (16); |
| (18) If is not satisfied, execute step 13; |
| (19) Output the uth unit power ; |
| (20) Set , and return to step 1. |
| End |