Research Article

Research on Multitarget Recognition and Detection Based on Computer Vision

Table 6

Some main models of the two algorithms.

One-stage algorithmPerformance summaryTwo-stage algorithmPerformance summary

YOLO seriesYOLOv1 is very fast and can be monitored in real-time. The recognition effect of small targets is not good, and pictures with fixed size.R-CNNRoss Girshick proposed in 2014. Selective search algorithm is used instead of sliding window, which solves the problem of window redundancy and reduces the time complexity of the algorithm. Convolution neural network replaces the traditional hand-made feature extraction part, which can extract the image features more effectively and improve the external anti-interference ability.
SSD seriesYOLOv2 solves the problem of difficult convergence and uses high-resolution pictures to fine-tune the network; anchor frame and convolution for prediction.SPPNetIn 2015, Kaiming He and others proposed. The feature map is obtained by running convolution layer only once from the whole image, which greatly reduces the time consumed by feature extraction. Reduce the loss of image information and avoid repeated calculation of convolution features. The lifting speed is about 24 times to 64 times.
M2DetYOLOv3 uses Darknet-53 as the network backbone and adopts FPN architecture.Mask, R-CNNIn 2017, He et al. proposed Mask R-CNN, which combines faster R-CNN and FCN. The multiscale feature extraction ability of the model is strengthened, and the recognition of small target objects is more accurate. The detection speed is about 5 pieces per second.
CentripetalNetYOLOv4 uses CSPDarknet 53 and many pervasive algorithms to achieve the best experimental results.D2DetCao et al. proposed in 2020. At the same time, it solves the problems of accurate positioning and accurate classification. Dense local regression and DRP are introduced to extract accurate target feature regions from the first stage and the second stage, respectively, thus improving performance.