Research Article

Generating Bird’s Eye View from Egocentric RGB Videos

Figure 6

Training pipeline for video-to-video translation. For the first three egocentric frames, we use the image-to-image translation module to generate rough predictions of bird’s eye view. All these along with the fourth egocentric views are input to the model, and the model generates bird’s eye view for the fourth frame. Then, for the fifth frame, we also send the previously generated output as the label for the fourth frame, and this goes on until all frames have been processed.