Flipit Game Deception Strategy Selection Method Based on Deep Reinforcement Learning

<div>Convergence performance of MFD-PPO under different hyper-parameters <span class="nowrap"><svg height="6.1673pt" id="M225" style="vertical-align:-0.2063904pt" version="1.1" viewbox="-0.0498162 -5.96091 5.44961 6.1673" width="5.44961pt" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink"><g transform="matrix(.013,0,0,-0.013,0,0)"><path d="M387 375C387 402 357 448 257 448C172 448 82 404 82 326C82 289 108 255 156 241V239C85 223 23 181 23 116C23 39 89 -12 182 -12C265 -12 336 31 378 91L361 114C320 73 269 47 216 47C157 47 115 82 115 137C115 191 160 219 218 219C243 219 262 218 272 217L304 259L302 266C295 265 281 264 255 264C195 264 163 294 163 335C163 377 200 416 249 416C293 416 321 389 329 342C331 332 335 329 341 329C355 329 387 352 387 375Z"></path></g></svg>.</span></div>

International Journal of Intelligent Systems

fig8

Figure 8

Figure 8: Flipit Game Deception Strategy Selection Method Based on Deep Reinforcement Learning