Research Article
Optimizing the Pairs-Trading Strategy Using Deep Reinforcement Learning with Trading and Stop-Loss Boundaries
Algorithm 1
Optimized pairs-trading system using DQN.
| Initialize replay memory and batch size | |
| Initialize deep Q-network | |
| Select pairs using cointegration test | |
| (1) For each epoch do | |
| (2) Profit = 1.0 | |
| (3) For steps t = 1, … until end of training data set do | |
| (4) Calculate spreads using OLS or TLS methods | |
| (5) Obtain initial state by converting spread to Z-score based on formation window | |
| (6) Using epsilon-greedy method, select a random action | |
| (7) Otherwise select | |
| (8) Execute traditional pairs-trading strategy based on the action selected | |
| (9) Obtain reward by performing the pairs-trading strategy | |
| (10) Set next state | |
| (11) Store transition in | |
| (12) Sample minibatch of transition from . | |
| (13) | |
| (14) Update Q-network by performing a gradient descent step on | |
| (15) End | |
| (16) End |