Limiting Dynamics for Q-Learning with Memory One in Symmetric Two-Player, Two-Action Games

<div>(a) Phase diagram for the possible BRNs in the prisoner’s dilemma, (b) phase diagram for the possible equilibria in the prisoner’s dilemma (only node 0 in region 1, nodes 0 and 1 in region 2, and nodes 0, 1, and 9 in region 3). For both plots, we have <span class="nowrap"><svg height="10.5386pt" id="M121" style="vertical-align:-1.57648pt" version="1.1" viewbox="-0.0498162 -8.96212 62.0984 10.5386" width="62.0984pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M449 634C442 637 425 643 405 650C376 660 341 666 307 666C181 666 98 590 98 485C98 400 170 343 215 310L246 288C307 243 343 204 343 147C343 67 291 18 219 18C104 18 61 124 51 202L23 199C28 124 27 71 27 47C47 22 122 -16 204 -16C324 -16 428 60 428 174C428 256 379 309 307 360L276 382C223 419 179 455 179 516C179 576 221 632 293 632C379 632 410 564 418 487L448 490C446 536 446 592 449 634Z"></path></g><g transform="matrix(.013,0,0,-0.013,9.768,0)"><path d="M535 323V373H52V323H535ZM535 138V188H52V138H535Z"></path></g><g transform="matrix(.013,0,0,-0.013,21.031,0)"><path d="M241 635C89 635 35 457 35 312C35 153 89 -12 240 -12C390 -12 443 166 443 312C443 466 390 635 241 635ZM238 602C329 602 354 454 354 312C354 172 330 22 240 22C152 22 124 173 124 313S148 602 238 602Z"></path></g><g transform="matrix(.013,0,0,-0.013,27.271,0)"><path d="M95 130C70 130 46 113 46 88C46 72 54 64 59 64C93 55 121 33 121 -3C121 -41 93 -68 44 -88L55 -117C117 -98 186 -56 186 22C186 91 131 130 95 130Z"></path></g><g transform="matrix(.013,0,0,-0.013,32.414,0)"><path d="M620 675H597C578 656 570 650 541 650H144C112 650 104 653 94 675H72C59 618 42 552 23 493L53 491C71 534 88 564 105 585C124 608 144 615 238 615H290L197 121C182 40 174 34 88 28L82 0H361L367 28C275 34 266 38 281 121L374 615H441C522 615 543 608 553 583C562 560 566 531 565 493L597 494C603 551 612 629 620 675Z"></path></g><g transform="matrix(.013,0,0,-0.013,44.331,0)"><path d="M535 323V373H52V323H535ZM535 138V188H52V138H535Z"></path></g><g transform="matrix(.013,0,0,-0.013,55.594,0)"><path d="M384 0V27C293 34 287 42 287 114V635C232 613 172 594 109 583V559L157 557C201 555 205 550 205 499V114C205 42 199 34 109 27V0H384Z"></path></g></svg>,</span> and <span class="nowrap"><svg height="9.49473pt" id="M122" style="vertical-align:-0.2063999pt" version="1.1" viewbox="-0.0498162 -9.28833 43.5344 9.49473" width="43.5344pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M494 514C482 587 419 712 303 712C238 712 174 667 174 603C174 561 205 514 249 449C219 438 187 422 162 407C93 366 23 283 23 177C23 69 87 -12 190 -12C244 -12 288 5 328 33C406 87 444 170 444 249C444 329 404 391 331 475C265 550 222 605 222 627C222 647 238 657 267 657C355 657 421 585 484 499L494 514ZM359 234C359 143 319 30 219 30C172 30 114 75 114 178C114 275 163 343 195 378C212 397 241 415 269 425C305 382 359 313 359 234Z"></path></g><g transform="matrix(.013,0,0,-0.013,10.354,0)"><path d="M535 323V373H52V323H535ZM535 138V188H52V138H535Z"></path></g><g transform="matrix(.013,0,0,-0.013,21.617,0)"><path d="M241 635C89 635 35 457 35 312C35 153 89 -12 240 -12C390 -12 443 166 443 312C443 466 390 635 241 635ZM238 602C329 602 354 454 354 312C354 172 330 22 240 22C152 22 124 173 124 313S148 602 238 602Z"></path></g><g transform="matrix(.013,0,0,-0.013,27.857,0)"><path d="M113 -12C146 -12 170 11 170 46C170 78 146 103 114 103S58 78 58 46C58 11 82 -12 113 -12Z"></path></g><g transform="matrix(.013,0,0,-0.013,30.821,0)"><path d="M137 343C167 482 260 545 321 574C357 591 397 603 429 609L423 641C382 634 335 622 295 608C189 570 37 457 37 238C37 84 125 -12 242 -12C362 -12 447 89 447 209C447 311 374 393 267 393C247 393 226 386 204 376L137 343ZM227 337C318 337 361 256 361 173C361 105 336 22 258 22C176 22 126 120 126 240C126 266 127 291 132 310C155 323 189 337 227 337Z"></path></g><g transform="matrix(.013,0,0,-0.013,37.061,0)"><path d="M153 550H386L412 615L406 623H120L82 318C104 327 142 338 184 338C294 338 347 275 347 187C347 112 305 39 221 39C160 39 119 71 97 89C88 97 80 96 71 90C59 80 50 67 49 57C48 45 52 36 66 23C80 9 123 -12 169 -12C221 -11 288 15 342 59C403 109 431 165 431 225C431 308 366 395 238 395C212 395 165 379 127 364L153 550Z"></path></g></svg>.</span></div>

Complexity

fig2

Figure 2

Figure 2: Limiting Dynamics for Q-Learning with Memory One in Symmetric Two-Player, Two-Action Games