Abstract
The detection of moving objects by machine vision is a hot research direction in recent years. It is widely used in military, medical, transportation, and agriculture. With the rapid development of UAV technology, as well as the high mobility of UAVs and the wide range of high-altitude vision, the target detection technology based on UAV vision is applied to traffic management such as vehicle tracking and detection of vehicle violations. The moving target detection technology in this study is based on the YOLOv3 algorithm. It implements moving vehicle tracking by means of Mean-Shift and Kalman filtering. In this paper, the Gaussian background difference technology is used to analyze the illegal behavior of the vehicle, and the color feature extraction technology is used to identify and locate the license plate, and the information of the illegal vehicle is entered into the database. The experiment compares the moving target detection of UAV vision and the traditional target detection in four aspects: recognition accuracy, recognition speed, manual time, and divergent results. The results show that the average accuracy rates of UAV vision-based moving target detection and traditional pattern recognition are 98.4% and 87.8%, respectively. The recognition speeds are 24.9 (vehicles/sec) and 10.6 (vehicles/sec), respectively. However, the artificial time and divergence results of moving target detection based on UAV vision are only 1/3 of the traditional mode. The moving target detection based on UAV vision has a better moving target detection ability.
1. Introduction
With the continuous development of technologies such as digital information and image recognition, the research on UAVs has been deepening in recent years, and UAVs have also achieved remarkable results in many fields. It is widely used in military reconnaissance, agricultural irrigation, field fire protection, urban transportation, and so on. Facing the increasing number of vehicles and complex traffic problems year by year, traditional vehicle photography technology can identify vehicle information and determine the location of the vehicle to a certain extent. However, complex traffic problems such as illegal parking, speeding, and occupation of emergency lanes often occur in vehicle traffic, as well as the immaturity of traditional vehicle detection technology, which leads to a series of defects in the traditional vehicle photography technology, such as low vehicle recognition, slow calculation and recognition, and long manual inspection time. By taking advantage of the UAV’s high-altitude field of view and superior stability, it is combined with moving target detection technology. Through aerial photography of road scenes, it can obtain stable and broad vehicle pictures. Through the corresponding technical identification of the vehicle picture, it realizes the effective detection of road vehicles.
2. Related Work
With the development of the times, there are more and more vehicles, and traffic accidents also occur frequently, so the motion detection of vehicles is particularly important. Many people have studied moving target detection technology. Among them, Xu et al. experiment studied moving vehicles through an adaptive filter and obtains the relevant factors of moving vehicle detection [1]. Pan et al. proposed a Fourier transform to compensate for motion loss to improve the accuracy of moving object detection [2]. The Kalra et al. experiment can effectively detect ground-moving targets. They used probability distribution functions to predict the movement of moving objects and form an effective motion dataset [3]. Minaeian et al. proposed a new visual object detection technology. Through unmanned vehicles to detect some dangerous areas, they have achieved good results [4]. Tah et al. acquired moving targets in the form of camera monitoring by deploying Kalman filtering. And they analyzed and achieved effective detection and tracking of moving targets [5]. Although target detection technology can achieve the detection and tracking of moving targets to a certain extent, it cannot fully utilize the advantages of moving target detection due to the lack of effective detection carriers.
Because UAVs are light and stable and have a high field of view, relevant workers have combined moving target detection with UAVs for research. Among them, Yundong et al. used drones to detect railway information and propose a technical means of segmenting pictures to solve the technical defects of small target detection [6]. Micheal et al. proposed a deep learning detector to train pictures taken by drones, so as to achieve a better motion recognition effect [7]. In the Escobar and Sandoval experiment, 8000 test images were taken by drones, and the accuracy of the test can reach 95% through the detection of image features [8]. In order to improve the accuracy of moving target detection in UAV recognition, Li et al. proposed an extraction method of cross-features [9]. Doukhi et al.’s research showed that the YOLO algorithm and deep learning can realize the visual moving target detection of UAV [10]. Taking the UAV as the carrier, moving target detection is effectively carried out through techniques such as image feature processing, but the use of the algorithm is not optimal.
This paper combines UAV vision and target detection technology and adopts a variety of vehicle detection technology and vehicle behavior recognition analysis technology. It effectively detects vehicle traffic problems and compares them with traditional object detection techniques. It analyzes the advantages and disadvantages of the two target detection methods. Innovation points are the following: (1) through a variety of vehicle recognition technologies, this paper comprehensively introduces the application of UAV target detection in vehicle management and detection, and (2) this paper compares vehicle detection with traditional target detection technology.
3. Moving Target Detection Method Based on UAV Vision
UAV has the advantages of high maneuverability, small size, and sensitive operation. It is widely used in transportation, military, scientific research, and other fields [11–13]. The development of computer technology and image recognition technology has made moving target detection technology more and more mature, and the use of unmanned aerial vehicles can make moving target detection technology to a higher level. This paper will use the moving target detection technology of UAV vision to study the vehicle traffic problem [14, 15]. The five aspects of vehicle detection, vehicle tracking, vehicle violation behavior analysis, license plate positioning and recognition, and illegal vehicle information input data database are studied through algorithm calculation. The moving target detection architecture diagram of UAV vision is shown in Figure 1.

3.1. Moving Target Detection Technology of UAV Vision
In order to adapt to the constantly changing scene of UAV aerial photography, this paper studies the target detection algorithm based on YOLOv3 and collects vehicle data through UAV aerial photography for training.
3.1.1. YOLO Algorithm
YOLO is a common detection algorithm that extracts image features through an artificial neural network and then uses the regression algorithm to achieve the effect of image detection. It is widely used in vehicle detection [16]. Based on the convolutional neural network, YOLO divides the image input into the system into units. Each grid detects the corresponding part of the image, and the unit predicts the cell border and the confidence of the cell border. The predicted probability of the cell border is generally represented by :
In formula (1), a cell prediction probability of “0” indicates that there is no object in the frame, and “1” indicates that there is an object.
The cell border confidence is expressed as
In formula (2), and are the cell center offsets, and are the border width and height, and is the border confidence.
3.1.2. YOLOv3 Algorithm
YOLOv3 is an improvement of the YOLO algorithm, which deepens the network structure through residuals on the basis of the original convolutional network structure. YOLOv3 performs frame prediction in a clustering manner and uses the predicted 4 values as the parameters to determine the frame prediction. They are the horizontal and vertical axes of the center point of the frame and the width and height of the frame [17]. The four parameters work together to predict the frame, and the determination of the frame target is expressed by the confidence, and the value range of the confidence is . The larger the value, the greater the probability that the frame contains objects. When the confidence level is 1, it means the real object and the bounding box are completely covered, and when the confidence level is 0, it means that the priority of the bounding box is lower [18]. The algorithm structure of YOLOv3 is shown in Figure 2.

In Figure 2, the algorithm structure of YOLOv3 is input from the picture to the downsampling layer, then undergoes 5 residual processing, and finally detects and generates the detection result. The loss function of the YOLOv3 network consists of three parts, the frame prediction error, the presence or absence of the target error, and the error during classification. The loss function of the YOLOv3 network is expressed as
In formula (3), represents the number of grids, and the explanation of , , , , and is shown in formula (2). represents the number of real boxes predicted by the grid, represents the detection target of the th predicted box in the th grid, and represents the probability of detecting the type. Class represents the data type.
3.2. Vehicle Tracking Technology Based on UAV Vision
The vehicle tracking technology based on UAV vision is mainly aimed at harsh conditions such as overspeed movement, ultralow speed movement, and occlusion of highway vehicles. This paper proposes an algorithm combining Kalman filter and Mean-Shift. It first uses the Kalman filtering algorithm to predict where the target may appear in the next frame. It then uses the Mean-Shift algorithm to search and match in the candidate area, so that it can adapt to different situations to achieve real-time, accurate, and effective tracking of vehicles.
3.2.1. Mean-Shift Algorithm
The Mean-Shift algorithm is a target tracking algorithm based on a color histogram, which initializes the tracking target in the way of human-computer interaction and tracks the target in a given area [19, 20]. Region locking is generally determined using region circles, which are scaled using length and width [21].
The core of the Mean-Shift algorithm is to accurately calculate the position information of the next target:
In formula (4), is the offset to the next target, is the coefficient, and is the kernel function. is a mathematical notation for transpose.
Then, the contour function is
In the case of formula (5), is a constant.
The flow chart of the Mean-Shift algorithm is shown in Figure 3.

3.2.2. Kalman Filter Algorithm
In describing the position of moving objects, the Kalman filter can be used, which has a very good position prediction effect. It mainly judges the next motion state of the object according to the previous motion state of the object, such as motion speed, direction, and other information, so as to realize the tracking of the target [22, 23].
(1) Bayesian Estimation. In most scientific calculations, it is necessary to preestimate the dependent variable over time. Bayesian estimation is a good way to solve the dynamic state of discrete time. The definition of Bayesian estimation is
In formula (6), represents a vector with dimension at time , and is a nonlinear state transition function. represents the motion state of the predicted object at time .
The association of observation vectors and is an observation model:
In formula (7), is the observation vector of dimension and is a nonlinear observation model.
The core of Bayesian estimation is to construct a probability density function from all the available information. Before transitioning the state, the prior probability must be predicted, and a transition state with the highest prior probability is selected by comparison. Then, the next state is predicted by the new observation and its corresponding probability function [24, 25]. Bayesian estimation is to use the observed value and the probability density to calculate the confidence of the system state .
Bayesian prediction process:
In formula (8), the process does not include a priori probability at time .
Bayesian update observation procedure:
If both the state transition model and the observation model exhibit a linear correlation, and the states at all times belong to a Gaussian distribution, then the Bayesian estimator is a Kalman filter under this particular condition.
(2) Kalman Filter. Kalman filter is a dynamic state linear programming process, each prediction is made on the basis of the previous state, and only the previous state is retained after the best prediction state is determined. As a result, systems employing Kalman filters have small storage space and thus have the ability to make fast, real-time predictions [26].
Since the Kalman filter is a linear prediction model, its state transition model and sensing model are both linear formulas. The expressions of these two models under the Kalman filter are
In formula (10), represents a state transition matrix and represents a state input matrix. represents the sensing model:
The core algorithm of the Kalman filter is the following.
Knowing the prior state and posterior state at time , the covariance of prior prediction error and posterior prediction error is expressed as
Then, the prior prediction error covariance is expressed as
The posterior prediction error covariance is expressed as
Then, the formula for the Kalman filter is
Deforming formula (16), can be obtained:
It can be seen from formula (17) that represents the gain, and the smaller the observation covariance , the greater the gain. At the same time, the smaller the , the larger the gain value. stands for transpose.
3.2.3. Combination of Mean-Shift and Kalman Filtering Algorithms
In the tracking of the vehicle, it is necessary to determine each frame of the vehicle motion, establish a Kalman filter to predict the vehicle position, and let the state vector of the vehicle be . The first two components are the vector values of the vehicle on the -axis and -axis, and the last two components are the speed of the vehicle on the -axis and -axis [27].
According to the kinematic theorem,
In formula (18), represents acceleration and represents time.
Then, the vehicle motion model is expressed as
The combined process of Mean-Shift and Kalman filtering algorithm is the following.
The position of the moving picture of the vehicle is marked, which records the state at the initial moment with the Kalman filter. The position predicted by the Kalman filter is iterated to the Mean-Shift method, and the position obtained by the iteration is updated to the Kalman filter, and then, the position is updated iteratively to realize the real-time tracking of the vehicle.
3.3. Vehicle Violation and Recognition Technology Based on Moving Image
Aiming at the characteristics of high traffic flow and fast speed on the expressway, this paper does not make further analysis of the normal driving vehicles. It only screens and identifies vehicles whose motion trajectory change rate exceeds the threshold (overspeed detection), the rate of change is lower than the threshold (ultralow speed and parking detection), movement track reverses (reverse), and the motion trajectory is within the emergency lane for a long time (occupying the emergency lane).
It marks the trajectory of the offending vehicle. It detects vehicle violations by means of background difference. It compares the image changes before and after by means of background difference to detect vehicle violations [28].
Background difference is a method of extracting the background image and the previous frame background image. And it analyzes the difference between the two images, and comparing the images can determine whether the vehicle has violated the rules such as speeding or ultralow speed. The process of the background difference method is shown in Figure 4.

The calculation steps of the background difference are the following.
Let the picture or video sequence be , the result of background analysis is
In formula (20), represents the background model, and represents the analysis result of the background difference.
Arrange to get
In formula (21), is the binarization of the difference result. represents the threshold value of the difference method judgment, which is a constant.
3.4. License Plate Location and Recognition Technology Based on License Plate Color and Texture Features
Aiming at the position of the marked illegal vehicle trajectory, the license plate area is quickly located, and the template matching method is used to identify the license plate characters. The location and recognition of the license plate are realized by the color and texture features of the license plate. Due to the different manufacturers of license plates or different production processes, the background colors of license plates are not the same, and the RGB color model covers all color representations.
By projecting the color of the license plate onto the three planes of , , and , it can be observed that the depth relationship of the color is . Since the location of the license plate requires color features, it is possible to convert RGB to HSV for research. Among them, represents chromaticity, represents purity, and represents lightness. The model of is shown in Figure 5.

The background colors of license plates are generally blue, yellow, black, and white. Through the collection of a large number of license plate data, the color threshold ranges of the four license plate background colors of are obtained, as shown in Table 1.
When the license plate color is blue or yellow,
In formula (24), and are the components on the and axes.
When the background color of the license plate is black or white, the calculation method is the same as formulas (22), (23), and (24).
In addition to using color features, the location and recognition of license plates also need to use the texture features of license plates. The human eye can only observe more than 20 levels of gray. However, when the gray level drops below level 5, characteristic texture information will be generated, so the noise reduction processing of the level 5 gray level can effectively locate and recognize the license plate. The flow chart of license plate texture recognition is shown in Figure 6.

Finally, for illegal vehicles, the key frame images are intercepted, and the license plate and illegal information are recorded in this paper and entered into the database.
4. Experiment of Moving Target Detection Based on UAV Vision
Moving target detection technology plays an important role in actual production and life. The processing process of the traditional moving target detection method is preprocessing of the target image, selecting the target area, extracting the target feature vector, and classifying with the help of a classifier. Although it can achieve effective target detection to a certain extent, with the development of image recognition and other technologies, the photographic level of traditional target detection technology is not good. However, with the development of image recognition and other technologies, the traditional image recognition technology of target detection technology has many shortcomings. There are many defects in traditional object detection methods. This paper will take the UAV as the carrier to compare and analyze the moving target detection technology under the UAV vision and the traditional target detection technology. The experimental object of this paper is the moving vehicle.
4.1. Effectiveness of Moving Target Detection
4.1.1. Sample Data
In order to make a comprehensive comparison between the moving target detection technology of UAV vision and the traditional moving target detection technology, the experimental samples must be strictly screened, and the selection of samples must be representative and gradient. In order to ensure the validity of the experiment, the experiment will select 12 kinds of moving target detection systems, 6 of which are in the field of view of the UAV, and the other 6 are not equipped with UAV equipment. In this paper, the index data that has a great influence on the detection of moving objects is counted. Table 2 is the index table of moving object detection.
From the data in Table 2, it can be seen that among the indicators of moving target detection, the accuracy of moving target detection and recognition, the speed of moving target detection, the labor time spent on moving target detection, and the divergence results generated by moving target detection have a great influence. The frequency of moving object detection and the image size of moving object detection have little effect.
4.1.2. Correlation Analysis of Samples
The selection of moving target detection samples will directly affect the experimental results, so when selecting moving target detection sample indicators, it is also necessary to analyze the degree of correlation between the sample indicators and moving target detection. The correlation analysis of samples is to amplify the characteristic information of the experimental data to better compare the experimental results. Through the data in Table 2, it can be found that the first four indicators have a great influence on the detection of moving objects, which is more than 90%. The impact of the latter two is not very large, so the first four indicators in Table 2 will be selected for correlation analysis of moving target detection. Table 3 is the correlation analysis table of moving target detection.
From the analysis of the data in Table 3, it can be seen that the correlation between the accuracy of moving target detection and recognition and moving target detection is the highest at 0.284. The lowest difference is 0.216 for moving target detection, but the overall difference is not large, and the data dimension of this experiment is not high. Therefore, all the moving target detection indicators in Table 3 are used as factors to measure the quality of the moving target detection system.
4.1.3. Validity Analysis of Samples
In order to compare whether the experiment of moving target detection technology based on UAV vision and traditional moving target detection technology is effective, the experiment will use the -fold cross-validation method for data verification. -fold cross-validation is designed to allow each data to be tested and verified. In this experiment, the 7-fold cross-validation method is selected; the test selects 3500 vehicles as the test data, of which 3000 vehicles are the test set and 500 vehicles are the test set. The experimental results of the validity analysis of two different motion detection techniques are shown in Table 4.
From the data analysis in Table 4, it can be seen that the average effectiveness of the above four moving target detection indicators for two different target detection systems is 88.2% and 86.8%, respectively. Therefore, the two target detection systems can be compared and analyzed through the above four indicators.
4.2. Comparison Experiment of Moving Target Detection under UAV Vision and Traditional Moving Target Detection
4.2.1. The Accuracy of Moving Target Detection and Recognition
The accuracy of moving target detection and recognition is the most basic indicator of the moving target detection system. In order to better compare the UAV-based moving target detection and recognition accuracy with the traditional detection and recognition accuracy, the experiment will select 6,000 vehicles for the test experiment. Among them, there are 2,000 small cars, 2,000 medium-sized cars, and 2,000 large cars. By continuously increasing the experimental data, this paper compares the target recognition accuracy of the two methods. Figure 7 shows the experimental results of the accuracy of moving target detection and recognition in two ways.

(a) Recognition accuracy of small car

(b) Recognition accuracy of medium car

(c) Large vehicle recognition accuracy chart
It can be seen from Figure 7 that with the increase of the model, the recognition accuracy will be improved. However, the overall UAV vision moving target detection and recognition accuracy is 10.7%; the highest recognition accuracy can reach 99%.
4.2.2. Speed of Moving Target Detection
The speed of moving target detection is to judge the ability of the detection system to process data, because the detection speed of vehicles will be affected by scene factors. Therefore, the experiment selects two scenarios, crossroads and ordinary roads, to detect the passing vehicles. The experimental results of the moving target detection speed of the two methods are shown in Figure 8.

(a) Intersection detection speed map

(b) Normal intersection detection speed map
From the analysis of Figure 8, it can be seen that the target detection speed at the intersection is slightly lower than that of ordinary intersections, and the moving target detection speed of UAV vision is 14 vehicles per second more than the traditional moving target detection and recognition accuracy.
4.2.3. Manual Time Spent on Moving Target Detection
Part of moving target detection is a mode of human-computer interaction, and image recognition is required when moving objects are detected. However, manual selection is sometimes required when selecting an image target. If the manual selection time is short or manual operation is not required, the performance of the moving target detection system will be greatly improved. To this end, an experiment will be done to compare the manual time spent on the two methods of moving target detection. The experimental environment is set with different vehicle ratios, which are 40% and 80%, respectively. By detecting the increase of vehicles, this paper observes the labor time spent, and the labor time spent in the two methods of moving target detection is shown in Figure 9.

(a) 40% vehicle ratio

(b) 80% vehicle ratio
From the data analysis in Figure 9, it can be seen that the more vehicles account for the components, the longer the manual detection takes. However, the manual detection of moving objects based on UAV vision generally takes much less time than the traditional method. When the vehicle ratio is 40%, the average labor time of the two is 2.25 s and 12.25 s, respectively. When the vehicle ratio is 80%, the average labor time of the two is 3.31 s and 16.12 s, respectively.
4.2.4. Divergent Results from Moving Target Detection
When the moving target detection is not accurate enough, there will be differences in the results, which will cause errors in the current detection results and even affect future detections. Its effective reduction or even elimination of divergence results is the ability of a good moving target detection system. In this experiment, 50 intersections and 50 ordinary intersections are selected to detect objects at the intersections. Observing the proportion of differences in the results of the two methods of moving object detection, the differences in the results of the two methods of moving object detection are shown in Figure 10.

(a) Divergence result map of ordinary intersection

(b) Divergence result map of intersection
From the data analysis in Figure 10, it can be seen that the proportion of divergent results of moving target detection based on UAV vision is much less than that of traditional methods. On different road types, the divergence results of moving target detection based on UAV vision are very low. The average disagreement results at intersections and common intersections are 3.25% and 3.75%.
4.3. Experiment of Two Kinds of Moving Target Detection
The experiment compares the moving target detection based on UAV vision and the traditional moving target detection from four dimensions: the accuracy of moving target detection and recognition, the speed of recognition, the time spent manually, and the divergent results. The average data comparison of the four dimensions is shown in Table 5. It can be seen that the moving target detection based on UAV vision has better performance than the traditional moving target detection.
5. Discussion
The development of UAVs and the combination with multiple fields have greatly promoted the progress of society. In terms of traffic control, traditional target motion detection often fails to achieve the detection effect of prefetching. It achieves effective detection of moving targets by utilizing the unmanned aerial vehicle’s supervision capability, improved image feature extraction, and powerful recognition and prediction technology.
6. Conclusion
Through experiments, the UAV vision-based moving target detection and the traditional moving target detection are compared in four aspects: recognition accuracy, recognition speed, labor time, and divergent results. This paper draws the following conclusions: (1) The average recognition accuracy of moving target detection based on UAV vision is 98.4%, which is 10.6% more than the traditional moving target detection. In terms of recognition speed, the average recognition speed of moving target detection based on UAV vision is 24.9 vehicles per second, which is 14.3 vehicles per second more than the traditional moving target detection. (2) The proportion of manual time spent and divergent results of moving target detection based on UAV vision is only about 1/3 of that of traditional moving target detection. Moving target detection based on UAV vision is far superior to traditional moving target detection in terms of detection and recognition. Detection and recognition algorithms and UAV vision are the core of UAV moving target detection. Therefore, finding better detection and recognition algorithms and improving UAV vision will be the direction of future research.
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Acknowledgments
This work was financially supported by the Natural Science Foundation of Hainan Province, China, Item number: 621QN0899.