Research Article
Coordinated Learning by Model Difference Identification in Multiagent Systems with Sparse Interactions
Algorithm 1
A model-based method for identifying coordinated states of agent
.
Input: Individual original MDP , Individual optimal -values of agent , threshold value | proportion , integer for Monte Carlo sampling times, exploration factor | Output: The set of coordinated states for agent | // performing Monte Carlo sampling to get empirical local MDPs. | () for do | () % decreases down to a small value multiplying with factor ; | () selects according to using local state with random policy ; | () observes local , according to global environmental state and action; | () records , and state information of other agents; | () % updates individual -values according to received experience ; | () end for | () for agent and do | () for state do | () gets empirical local MDP of agent when agent is in state ; | () end for | () end for | // determining the coordinated states of agent according to the two MDPs. | () for each state , agent and , state do | () computing all the model difference degree according to (4); | () if is bigger than then | () augments the coordinated state of agent in state to include of agent ; | () end if | () end for |
|