Research Article
Charging Station Management Strategy for Returns Maximization via Improved TD3 Deep Reinforcement Learning
Table 2
Pseudocode of proposed Improved TD3 algorithm.
| Pseudocode of improved TD3 algorithm for sales price and energy scheduling |
| 1 | Initialization critic network with , , and actor network with random parameters , , | Initialization target networks , , | Initialize replay buffer and beta parameter | 2 | for = 1: (time step of exploring cycle) do | 3 | Select a random noise for action exploration | 4 | get initial states: | 5 | for = 1: (time step of training cycle) do | 6 | Select action with exploration noise | 7 | Execute action observe rewards and new states | 8 | Store transition in | 9 | Sample mini-batch of transitions from | 10 | , | 11 | | 12 | Update critics | 13 | If mod then | 14 | Update the actor policy by the deterministic policy gradient: | 15 | | 16 | Update target networks: | 17 | | 18 | end if | 19 | end for |
|
|