Research Article
Decentralized Reinforcement Learning Approach for Microgrid Energy Management in Stochastic Environment
| | //initialization | | (1) | Initialize the learning parameters and , | | (2) | Set K1, Tk, Ꜫ, μ, | | (3) | Initializes for all states and actions. | | | //learning | | (4) | For k = 1: K1 | | | //exploration period | | (5) | For n = 1: Tk | | (6) | For t = 1: 24 | | (7) | Each agent senses the states of the environment | | (8) | The demand is predicted by the exponential random distribution | | (9) | The outputs of wind and PV are determined. | | (10) | Each agent takes random actions with the probability 1-Ꜫ and selects the best action with the probability Ꜫ based on | | (11) | MO clears the market | | (12) | Each agent observes its immediate reward | | (13) | The -function for each agent is updated according to equation (6) | | (14) | End | | (15) | End | | | //end exploration period | | (16) | is updated with for each agent with the probability 1 − μ | | (17) | End | | | //end learning |
|