Research Article
Ensemble Investment Strategies Based on Reinforcement Learning
Table 1
Pseudo-code for the A2C model.
| Input: environment of the stock market | Output: estimated optimal strategy | Initial setup of actor and critic networks | Repeat | For episodes = 0, 1, 2, …, N do: | Get state and calculate to get action | IF the episode does not end there: | Get with reward | Using critic networks to obtain return values to estimate Q | Calculating the gradient using Q values and updating the actor network | Updating the critic network to reduce the difference | Update status | End | End | To convergence |
|
|