Research Article
Experience Weighted Learning in Multiagent Systems
(1) | repeat | (2) | i = 0 | (3) | Initialize Q (s, a) | (4) | repeat | (5) | Choose an action A using policy derived from Q (e.g., -greedy) | (6) | Choose an opponent randomly | (7) | Take action A and observe R, | (8) | | (9) | | (10) | until S is terminal | (11) | i = i + 1. | (12) | until i = the total number of all the agents |
|