Research Article

Semantic Extraction of Basketball Game Video Combining Domain Knowledge and In-Depth Features

Figure 2

Our network architecture. (We use a standard CNN architecture (VGG-16) to extract features from sampled appearance and motion frames in the video. These features are then brought together across space and time using a pooling layer of this paper’s aggregation layer, which can be trained end-to-end and has a classification loss.)