Stock Trading Strategies Based on Deep Reinforcement Learning

<div>The overall structure of the model. <svg height="9.10848pt" id="M1" style="vertical-align:-3.291111pt" version="1.1" viewbox="-0.0498162 -5.81737 8.98808 9.10848" width="8.98808pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M219 86C216 168 211 250 206 337C201 410 189 448 163 448C131 448 79 396 43 344L60 322C91 359 110 375 118 375S132 358 136 298C141 238 152 81 155 -12H182C242 62 331 177 390 258C435 321 451 360 451 391C450 424 432 448 408 448C390 448 372 435 366 419C362 410 362 401 366 394C373 383 376 367 376 350C376 283 262 138 221 86H219Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,5.811,3.132)"><path d="M250 606C250 634 233 656 203 656C168 656 146 618 146 593C146 564 169 545 192 545C227 545 250 573 250 606ZM227 95L212 119C187 98 152 71 135 71C129 71 128 78 134 102L207 373C219 418 217 451 194 451C165 451 92 411 30 351L44 326C77 353 106 371 114 371C124 371 121 357 117 341L55 97C32 5 46 -12 70 -12C108 -12 191 51 227 95Z"></path></g></svg> represents candlestick chart feature, <svg height="9.10848pt" id="M2" style="vertical-align:-3.291111pt" version="1.1" viewbox="-0.0498162 -5.81737 11.5822 9.10848" width="11.5822pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M219 86C216 168 211 250 206 337C201 410 189 448 163 448C131 448 79 396 43 344L60 322C91 359 110 375 118 375S132 358 136 298C141 238 152 81 155 -12H182C242 62 331 177 390 258C435 321 451 360 451 391C450 424 432 448 408 448C390 448 372 435 366 419C362 410 362 401 366 394C373 383 376 367 376 350C376 283 262 138 221 86H219Z"></path></g><g transform="matrix(.0091,0,0,-0.0091,5.811,3.132)"><path d="M538 684C542 703 539 710 529 710C512 710 455 682 363 674L361 644H396C444 644 448 642 438 594L405 438C373 450 347 451 336 451C288 451 200 416 147 374C69 312 24 207 24 113C24 29 60 -12 94 -12C126 -12 156 4 196 30C231 53 296 100 346 166H348L328 81C311 10 325 -12 348 -12C380 -12 450 24 512 98L495 125C466 97 431 70 417 70C408 70 408 80 411 98L538 684ZM390 373L361 240C336 196 210 56 145 56C132 56 113 73 113 131C113 216 156 338 220 379C243 394 266 402 303 402C331 402 375 388 390 373Z"></path></g></svg> represents the feature of stock data and technical indicators, and the feature vector obtained by contacting these two feature vectors is used as the input of the two fully connected (FC) layers. In this paper, FC layers are used to construct the dueling DQN network; the two FC layers represent the advantage function <svg height="11.5564pt" id="M3" style="vertical-align:-2.26807pt" version="1.1" viewbox="-0.0498162 -9.28833 34.9528 11.5564" width="34.9528pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M686 28C612 35 607 44 591 112C563 234 541 360 519 489L489 666L457 658L147 121C100 40 89 36 24 28L17 0H240L250 28C168 34 159 41 190 101L262 237H482C495 180 503 137 510 91C517 47 514 35 441 28L433 0H677L686 28ZM475 280H285L429 541H431L475 280Z"></path></g><g transform="matrix(.013,0,0,-0.013,9.135,0)"><path d="M300 -147C201 -63 143 98 143 270S200 602 300 686L282 710C136 610 70 450 70 271V270C70 89 136 -72 282 -170L300 -147Z"></path></g><g transform="matrix(.013,0,0,-0.013,13.633,0)"><path d="M352 391C352 416 319 448 267 448C236 448 173 423 147 400C107 364 96 332 96 304C96 248 143 210 193 181C241 153 258 124 258 100C258 72 232 38 184 38C151 38 107 66 81 108C77 114 64 116 55 111C34 99 23 84 23 65C23 29 81 -12 134 -12C220 -12 325 61 325 141C325 184 297 215 234 256C194 282 161 309 161 346C161 380 188 401 217 401C255 401 279 380 301 353C308 344 313 341 325 347C341 355 352 371 352 391Z"></path></g><g transform="matrix(.013,0,0,-0.013,18.508,0)"><path d="M95 130C70 130 46 113 46 88C46 72 54 64 59 64C93 55 121 33 121 -3C121 -41 93 -68 44 -88L55 -117C117 -98 186 -56 186 22C186 91 131 130 95 130Z"></path></g><g transform="matrix(.013,0,0,-0.013,23.651,0)"><path d="M483 97L471 123C436 91 401 65 392 65C388 65 384 74 390 106C414 239 444 378 457 429L455 433C444 433 429 436 416 439C392 444 368 448 344 448C281 448 204 415 152 376C71 315 23 205 23 103C23 21 57 -12 85 -12C114 -12 149 6 185 34C231 70 285 119 329 183H331L309 81C292 0 308 -12 326 -12C350 -12 421 24 483 97ZM374 387C370 363 356 291 345 261C315 193 181 50 139 50C124 50 110 71 110 118C110 224 153 331 218 379C238 394 271 402 301 402C329 402 359 394 374 387Z"></path></g><g transform="matrix(.013,0,0,-0.013,30.231,0)"><path d="M275 270C275 450 212 609 64 710L45 686C145 604 203 442 203 270S147 -63 45 -147L64 -170C213 -68 275 89 275 270Z"></path></g></svg> and state value function <svg height="11.5564pt" id="M4" style="vertical-align:-2.26807pt" version="1.1" viewbox="-0.0498162 -9.28833 23.2742 11.5564" width="23.2742pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M697 650H468L461 623L492 619C539 613 547 605 518 546C481 471 367 264 278 116H276C239 278 197 500 186 567C180 604 185 613 226 619L252 623L260 650H24L17 623C78 617 92 613 108 533L216 -11H247C365 200 515 462 560 529C616 612 624 615 689 623L697 650Z"></path></g><g transform="matrix(.013,0,0,-0.013,9.224,0)"><path d="M300 -147C201 -63 143 98 143 270S200 602 300 686L282 710C136 610 70 450 70 271V270C70 89 136 -72 282 -170L300 -147Z"></path></g><g transform="matrix(.013,0,0,-0.013,13.722,0)"><path d="M352 391C352 416 319 448 267 448C236 448 173 423 147 400C107 364 96 332 96 304C96 248 143 210 193 181C241 153 258 124 258 100C258 72 232 38 184 38C151 38 107 66 81 108C77 114 64 116 55 111C34 99 23 84 23 65C23 29 81 -12 134 -12C220 -12 325 61 325 141C325 184 297 215 234 256C194 282 161 309 161 346C161 380 188 401 217 401C255 401 279 380 301 353C308 344 313 341 325 347C341 355 352 371 352 391Z"></path></g><g transform="matrix(.013,0,0,-0.013,18.597,0)"><path d="M275 270C275 450 212 609 64 710L45 686C145 604 203 442 203 270S147 -63 45 -147L64 -170C213 -68 275 89 275 270Z"></path></g></svg> in dueling DQN. The final <i>Q</i> value is obtained by adding the outputs of the two functions.</div>

Scientific Programming

Stock Trading Strategies Based on Deep Reinforcement Learning

Figure 1