Research Article

Unbiased Model-Agnostic Metalearning Algorithm for Learning Target-Driven Visual Navigation Policy

Figure 2

An example of exploring processes with two kinds of termination condition. The episode of navigating to bed ends when the agent observes the target clearly at the nearest viewpoint. Since in the other exploring process the agent fails to gain the full view of the television, the training episode is terminated at step 10,000.