|
| Image classification | Challenges such as viewpoint change, scale change, intraclass change, image deformation, image occlusion, lighting conditions, and background clutter; nowadays, the popular image classification architecture is convolution neural network. |
|
| Object recognition and detection [18] | Subdivision detection algorithms such as face detection, vehicle detection, and character recognition are derived. Commonly used models are R-CNN and fast R-CNN. |
| Semantic segmentation | Every pixel of the input image is classified, and its inner meaning can be clearly described with a picture. Commonly used models are full convolution network (FCN), SegNet, and so on. |
| Motion and tracking | Generally speaking, large-scale convolution neural networks can be trained as classifiers and trackers. The representative tracking algorithms are full convolution network tracker (FCNT) and multidomain convolution neural network (MD net). |
| Visual question and answer | The purpose of this study is that users ask questions according to the input images, and the algorithm automatically answers questions according to the content of questions. |
| Motion recognition | In practical applications, accurate motion recognition is helpful for public opinion monitoring, advertising, and many other tasks related to video understanding. |
| Three-dimensional reconstruction | In the field of 3D vision, geometry-based methods are still the main methods, such as 3D reconstruction and visual SLAM. |
|