International Transactions on Electrical Energy Systems

Research Article

Charging Station Management Strategy for Returns Maximization via Improved TD3 Deep Reinforcement Learning

Pseudocode of proposed Improved TD3 algorithm.


Pseudocode of improved TD3 algorithm for sales price and energy scheduling

1	Initialization critic network with , , and actor network with random parameters , ,
	Initialization target networks , ,
	Initialize replay buffer and beta parameter
2	for = 1: (time step of exploring cycle) do
3	Select a random noise for action exploration
4	get initial states:
5	for = 1: (time step of training cycle) do
6	Select action with exploration noise
7	Execute action observe rewards and new states
8	Store transition in
9	Sample mini-batch of transitions from
10	,
11
12	Update critics
13	If mod then
14	Update the actor policy by the deterministic policy gradient:
15
16	Update target networks:
17
18	end if
19	end for