Research Article
Diversity Evolutionary Policy Deep Reinforcement Learning
Table 3
The mean and standard deviation of the cumulative return per turn in different MuJoCo tasks.
| Task | TD3 | Multiactor TD3 | CEM | CEM-TD3 | DPERL |
| Hopper-v2 | 3025 ± 577 | 3241 ± 363 | 1054 ± 17 | 3652 ± 116 | 3732 ± 106 | HalfCheetah-v2 | 10002 ± 930 | 10341 ± 578 | 2298 ± 690 | 10978 ± 758 | 11615 ± 464 | Ant-v2 | 3618 ± 425 | 3881 ± 319 | 845 ± 52 | 4037 ± 466 | 4852 ± 317 | Walker2d-v2 | 4399 ± 238 | 4470 ± 301 | 743 ± 225 | 4612 ± 357 | 5001 ± 562 |
|
|