Research on Multitarget Recognition and Detection Based on Computer Vision
Table 6
Some main models of the two algorithms.
One-stage algorithm
Performance summary
Two-stage algorithm
Performance summary
YOLO series
YOLOv1 is very fast and can be monitored in real-time. The recognition effect of small targets is not good, and pictures with fixed size.
R-CNN
Ross Girshick proposed in 2014. Selective search algorithm is used instead of sliding window, which solves the problem of window redundancy and reduces the time complexity of the algorithm. Convolution neural network replaces the traditional hand-made feature extraction part, which can extract the image features more effectively and improve the external anti-interference ability.
SSD series
YOLOv2 solves the problem of difficult convergence and uses high-resolution pictures to fine-tune the network; anchor frame and convolution for prediction.
SPPNet
In 2015, Kaiming He and others proposed. The feature map is obtained by running convolution layer only once from the whole image, which greatly reduces the time consumed by feature extraction. Reduce the loss of image information and avoid repeated calculation of convolution features. The lifting speed is about 24 times to 64 times.
M2Det
YOLOv3 uses Darknet-53 as the network backbone and adopts FPN architecture.
Mask, R-CNN
In 2017, He et al. proposed Mask R-CNN, which combines faster R-CNN and FCN. The multiscale feature extraction ability of the model is strengthened, and the recognition of small target objects is more accurate. The detection speed is about 5 pieces per second.
CentripetalNet
YOLOv4 uses CSPDarknet 53 and many pervasive algorithms to achieve the best experimental results.
D2Det
Cao et al. proposed in 2020. At the same time, it solves the problems of accurate positioning and accurate classification. Dense local regression and DRP are introduced to extract accurate target feature regions from the first stage and the second stage, respectively, thus improving performance.