Research Article

Ensemble Investment Strategies Based on Reinforcement Learning

Table 1

Pseudo-code for the A2C model.

Input: environment of the stock market
Output: estimated optimal strategy
Initial setup of actor and critic networks
Repeat
For episodes = 0, 1, 2, …, N do:
  Get state and calculate to get action
  IF the episode does not end there:
   Get with reward
   Using critic networks to obtain return values to estimate Q
   Calculating the gradient using Q values and updating the actor network
   Updating the critic network to reduce the difference
   Update status
  End
End
To convergence