Abstract

People are increasingly interested in moving object detection technology as a part of computer vision technology, which has grown in importance in recent years. Moving target detection technology is widely employed in military defense, security monitoring, medical examinations, intelligent transportation, and many other industries at the present time. A moving target detection system relies on rapid and accurate video picture segmentation to identify, locate, and analyze the target. The trajectory of a basketball player’s shooting motion and the extraction of features are important to increase the accuracy of basketball players’ shots. This study proposes an area growth algorithm-based approach to tracing the shooting motion of basketball players. The trajectory of the shot motion is captured using a video sensor image tracking approach. The corner points and the edge contours of the trajectory image are extracted using the edge contour feature extraction method, the feature extraction from the firing motion trajectory image is merged with the regional linear growth approach, and the corner points are then marked. Feature extraction improves basketball players’ shooting accuracy and trajectory control ability. Simulation results show that the prediction accuracy of basketball players’ shooting trajectories using this method can reach up to 100%, which improves the accuracy of motion trajectory extraction and enhances basketball players’ shooting motion control.

1. Introduction

With the rapid development of digital network technology, video images have become an important carrier of information transmission. The massive and abundant motion information contained in the video image sequence has aroused people’s great interest [13]. Although human eyes can directly distinguish moving objects and extract motion information from video image sequences, only relying on human natural intelligence to obtain and process motion information can no longer meet the needs of social development. Using computer vision to replace human vision, extracting, analyzing, and understanding motion information from image sequences have become a popular direction in modern scientific research. As the basis and key link of computer vision motion analysis, the detection and tracking of moving objects in video image sequences are of a great value in theoretical research and practical applications [46].

Since the development of video motion analysis, many algorithms for moving target detection and tracking have been produced, but most of the algorithms are proposed for specific scenarios and have certain limitations in generality. Sports professionals, athletes, and coaches use video motion analysis to acquire data by employing digital movie cameras to capture moving images and then using software to analyze the captured images frame by frame. The actual scene is complex. Changes in weather and lighting conditions, shadows of moving targets, moving in and out of other objects, mutual occlusion between targets, real-time requirements for algorithms, etc., have caused difficulties in detecting and tracking moving targets. Therefore, it is still a challenging subject to study robust, accurate, and high-performance moving target detection and tracking algorithms. Through the understanding of the detection method of the moving target detection technology, to use the moving target detection technology to detect the basketball shot, the acquisition of the basketball position information and characteristic information in the image becomes the key in the testing system. The following summarized and analyzed several methods of moving target detection technology to determine the shooting detection method [710].

1.1. Optical Flow Method

In the picture, there is crucial information about the moving item included in the optical flow [1114]. As a result, the moving item in a picture series may be detected using optical flow technology. Assigning a velocity vector to each pixel in the picture is the fundamental premise of optical flow. The collection of velocity vectors from each pixel in the image is then used to generate a motion field. The projection model may be used to determine the relationship between the pixels on the picture and the points on the three-dimensional object at a given moment. The picture may be dynamically evaluated and judged based on the speed vector properties of the pixels. A moving item is not seen in a photograph unless the quantity of light lost throughout the full image region is constantly changing. In order for the moving target to appear in the picture, it must have a velocity vector distinct from the background velocity vector. This implies that the moving item can be identified, and its precise location in the image can be computed. When the camera is moving, the optical flow approach is more accurate and may be used. L-K (Lucas and Kanade) and H-S (Hom and Schunck) are two of the most often used optical flow algorithms. Optical flow technology has been increasingly popular in recent years as computer processing capacity has increased, and new optical flow technologies have arisen.

1.2. Interframe Difference Method

There is a considerable quantity of moving target information included in the interframe difference technique, and the essential principle of the frame difference method is to use the motion information of the adjacent frames to perform moving target recognition and segmentation of the video sequence. If the grayscale, texture, and other information of two adjacent frames of images are relatively similar, the method of interframe difference images can only obtain the edge contours of the moving objects, and it is impossible to detect the complete moving targets in the image. There may be ghosts in the separated image of the foreground/background; when the moving target speed is relatively fast, it will cause the background occlusion change area between two adjacent frames to be large, and it is easy to judge the occluded background as the target. This will affect the feature parameter extraction of the object and the segmentation of moving objects to a large extent. Because of this, many domestic scholars have improved the interframe difference method and applied the interframe difference method to practice. For example, Wang Xiaoyan et al. have conducted in-depth research on the use of the three-frame difference method to extract targets. Yao Qian et al. have used the idea of combining the three-frame difference method [1519] and mean-shift to realize the detection and tracking of the human body. Wang Bin et al. made vehicle recognition possible by combining two types of different methods: the background difference technique and the interframe difference method, whereas Wei et al. use the three-frame difference approach in conjunction with better Gaussian modeling to detect moving objects. Detection precision has been increased much further. Background adaptive moving target detection was proposed by Li Ling based on the lack of update in the traditional Gaussian mixture background model [2022].

1.3. Background Difference Method

The background difference method based on the difference image and the frame difference method is both based on the same detection principle, but the background difference method does not make a difference between two adjacent image frames, but instead establishes a background reference model and uses the background reference model to detect differences between two adjacent image frames. It is possible to create a technique for recognizing moving targets by calculating the image difference between the model and the current picture frame. Essentially, this approach is identical to the interframe difference method, which is ideal for situations in which the camera remains stationary. When a good background reference model is available, the motion area can be completely segmented. Because the background difference method has better real-time performance when extracting moving targets in video image sequences, it has been widely used in practice. At present, many experts in computer vision-related fields at home and abroad have proposed various background difference algorithms, for example, the image difference threshold judgment method proposed by Lipton et al. A background difference method based on compressed sensing and dictionary learning proposed by Guo Houyun et al. can solve the problems of sudden changes and gradual changes in the background, redundancy of image data, and interference of false foregrounds on target detection. Chen Yusi proposed a new fast algorithm for light-robust moving target detection based on the idea of classification and block. Wu Jing uses the empty background modeling algorithm to improve the background difference method. Lei Yu uses a weighted background update strategy to model the background. After the background is saved, an adaptive background replacement algorithm is applied to solve the interference problem caused by the foreground occlusion [2326].

The main goal of this study is to design and extract the basketball group movement trajectory feature algorithm using the background difference approach. For study and discussion, this study is divided into five sections. Section 1 primarily introduces the background, significance, and content of the subject’s research. Section 2 focuses on image denoising, calibration, segmentation, and mathematical morphology methods, which are all commonly used image processing techniques. Section 3 first makes a more in-depth study of the background difference method from three aspects: basic principles, algorithm flow, and background modeling. The location of the camera is chosen in the fourth quarter by studying the characteristics of fixed-point shooting and the examination guidelines. Finally, the fixed-point shooting detection algorithm is described in detail using a combination of shooting characteristics and a background difference method. The summary and outlook section is in Section 5.

In the process of using the background difference method to detect basketball shots, the images acquired by the camera cannot be directly analyzed and processed. This is because the image data are easily contaminated in the process of capturing the image, and the captured image is in the process of conversion and transmission. China will also be polluted by noise. If the captured image is processed and analyzed directly, it is very likely that no moving target can be extracted. Therefore, it is necessary to use image processing-related technologies to denoise the captured image or other auxiliary processing. This section will carry out these technologies.

2.1. Image Capture

It is called “image acquisition” to discretize and digitize a continuous picture signal and then to transport the resulting digital signal to a frame or computer memory in order to create a digital image signal. Image collecting can be classified into two categories in most cases: one for photographing a certain moment and one for video recording to capture a specific instant; in this case, it is static image collection versus dynamic image collection. The goal is to provide a steady stream of photos over a period of time. All images acquired by a camera can be saved as a digital signal or transferred to a computer for postprocessing, depending on the camera’s capabilities. Dynamic image collection now relies on the camera’s hard drive or the computer’s hard drive to store and process the photos captured.

2.2. Image Resolution

The image resolution and image size together determine the file size and output quality. The larger the value is, the larger the disk space occupied by the image file and the more time it takes to process the image. In this topic, the online processing of the captured images requires high real-time performance, and the target detected in the processing is basketball, and the target is single. Therefore, the recognition of the target can be completed by using low-resolution images. For the detection of basketball shots, the basketball goal is relatively large and the appearance is obvious. In order to further improve the real-time performance of the system, the minimum 320 × 240 resolution can meet the requirements when using the camera to capture images.

2.3. Image Denoise

Picture denoising is the technique of eliminating and suppressing noise from an image after it has been contaminated or accompanied by it. Denoised images have a direct impact on the following processing of an image because they are often processed in the preprocessing stage. In the process of acquiring images by the camera, the operating conditions of the camera components are affected by various objective factors, for example, the thermal noise of the camera, the jitter noise caused by the mechanical movement of the camera, and other internal noises. Other common noises include additive noise, salt and pepper noise, quantization noise, and multiplicative noise.

2.4. Denoise Methods

Many novel picture denoising algorithms have arisen as a result of the ongoing development of digital image processing technologies. Spatial domain filtering and frequency domain filtering are two types of image denoising procedures that may be used to improve the quality of a picture. The ultimate purpose of the image denoising process, regardless of the kind of denoising method used, is to suppress or eliminate noise from the picture in order to give a clean image for the subsequent processing module [2729].

The spatial domain image filtering method is the most direct filtering method, which directly performs operations on the original image to be processed. Spatial domain denoising can be divided into point operations (that is, directly performing operations on each pixel in the image) and local operations (that is, directly performing operations on an area adjacent to the pixel to be processed) according to the operation mode. So far, many researchers have made many improvements to the spatial domain image denoising method and achieved good results.

The method of image denoising in the frequency domain is different from that in the space domain. The denoising process of this type of method is to transform the image from the spatial domain to the frequency domain. Then, some special processing is performed on the transform coefficients in the frequency domain, and after the processing result is obtained, the image is inversely transformed to achieve the effect of denoising. Among them, the more commonly used methods for mutual conversion between image spatial domain and frequency domain are wavelet transform and so on.

Mean filtering is another name for neighborhood averaging. Using the neighborhood averaging method to reduce picture noise is a common practice in image processing since it is the most obvious, straightforward, and easy to apply the denoising method. The image to be processed is f(x, y), R is the kernel, and N is the total number of pixels in the kernel. Then, the image filtered by the neighborhood averaging method is , which is described by the mathematical formula:

The median filter is the same as the neighborhood average method, which is a common method to suppress image noise. However, the median filter method performs better than the neighborhood average method on the basis of preserving the useful information of the image. While average value replacement would only replace the pixels in this region, median filtering sorts each pixel in the picture and finds an intermediate value that can be used to replace all of the pixels in this area. If a neighborhood is specified by a mathematical formula, the first n numbers in that neighborhood are sorted. If x represents the median of all pixels in this neighborhood, then the following is true:

Corrosion and expansion are the two most basic mathematical morphological methods in morphological operations. All other morphological algorithms are based on the composite form of these two basic operations. First, we will introduce the corrosion of images. For a given image A and structural element B, using the structural element B to corrode the binary image A can be defined by mathematical expression :

The expansion operation of the image A using structural element B is defined by mathematical expression as follows:

The closing and opening operations of morphology are dual operations. The operation process of the closing operation is just the opposite of that of the opening operation. This method first expands the binary image and then corrodes the image. The definition is as follows:

The purpose of area segmentation is to reduce the detection area, increase the operating speed of the system, and further strengthen the anti-interference ability of the system.

3. Introduction to the Background Difference Method

3.1. Basic Idea

The background difference method compares the current frame image to a known background model reference image, calculates the similarity metric value between each point in the image and each point in the background model, and then applies (6) to classify the foreground and background objects:(1)Precision in position, rapidity in the calculation, and the ability to segment entire moving objects are all advantages of the background difference approach. As a result, the backdrop model must be updated on a more frequent basis with this technique, which is more sensitive than other methods. Preprocessing of the current image is required before the moving target can be detected. The first step in preprocessing is to transform the image and reduce noise; the second is to use the original image as a background reference model, and the third and last step is to apply the final effects. The current frame image and additional subsequent processing are distinguished from the background reference model. For this method to work, the backdrop model must be updated in real-time to account for the scene’s changing environment. A strong backdrop model may be used to detect a moving target in a photograph with high accuracy. The vast majority of recent background modeling algorithms are developed from the enhancement and optimization of the original algorithm’s design.

3.2. Background Modeling

The majority of modern background modeling approaches are derived from the original algorithm’s improvement and optimization.

3.2.1. Gaussian Background Modeling

When the scene is more complex, such as if the leaves in the scene surroundings are moving, or if a camera is shaking, Gaussian background modeling is more appropriate. The principle of this type of method is to separately model each pixel in the original image and assume that each pixel in the image has nothing to do with other pixels for a period of time and is independent, and each background pixel is independent The eigenvalue fluctuation satisfies the Gaussian distribution over a period of time, and the background modeling is carried out based on this reasonable assumption. A new image is captured and preprocessed, and the difference between each pixel’s value in this new image and the Gaussian distribution of the background model is compared to determine whether this pixel belongs in the background or the previous scenic spot in the image. Then, according to these update conditions, the background model is updated as necessary.

3.2.2. Mean Value Method Background Modeling

If the video picture sequence is not too complicated, the mean value approach can be used for backdrop modeling. Averages can be thought of as statistical filters. In order to put it into action, a series of still images taken by the camera is stitched together over time. This average value is used as the back and the accumulated value is divided by the captured frame. Finally, we get the average value and get the scene reference model. It expressed in mathematical expressions as

3.2.3. Kernel Density Estimation Method

The modeling methods introduced above are all based on the parametric model, and this method is the most commonly used method among nonparametric modeling methods. If you do not know the distribution of the relevant data in your sample, you can use this method to get the unknown density function from your sample. In order to estimate the probability density distribution of the current pixel value at time t, the kernel function can be employed with images in which the same characteristic pixels are labeled as x1, x2,…, xN:

The standard deviation of each feature pixel is calculated independently according to the following formula:where m is the median value of the absolute difference between the pixel feature values of two adjacent image frames, and the mathematical expression is(a)Due to its adaptability to a wide range of contexts, the kernel density estimation approach eliminates both the necessity for parameter estimation and the assumption of a probability distribution for the background characteristics during background modeling. Because it requires a lot of memory and computation time to handle the many different types of background distribution that can occur due to things such as light changes, camera wobble, leaf sway, shadows, and other factors, this method is not fast enough for real-time video detection.(b)Threshold selection.

In order to be able to quickly and accurately extract the position and appearance information of the basketball feature from the video image sequence, then it is necessary to binarize the gray image after image difference. To binarize the gray image, you first need to give a segmentation threshold. The gray value of the point in the gray image is the same as the given value. The segmentation threshold value is compared, and the grayscale image is converted into a binary image according to the result of the comparison. Through the process of binarization, we can know that, in the process of binarization, the most important thing is the selection of the segmentation threshold. The appropriateness of threshold selection directly affects the subsequent target extraction and other tasks. The following are several commonly used methods.

To find the best segmentation threshold value in the image by using the maximum ratio method is actually to find a threshold value that maximizes the sum of the two types of information, the front scenic spot, and the background. The mathematical expression can be expressed as

When using the average grayscale technique, the first step is to gather the gray values of all pixels in the picture, then divide the total collected value by the total number of pixels in the image, and lastly get the average gray value in the image. It expressed by mathematical expression as

In the process of binarizing grayscale images, the maximum between-class variance method should select the segmentation threshold to maximize the difference between the grayscale value of the foreground area and the average grayscale value of the entire grayscale image. Usually, this species difference is represented by the variance of the area. The basic principle of this algorithm is derived on the basis of the principle of the least-squares method of decision analysis. It is a relatively stable and has very simple calculation but is a relatively common algorithm.

4. Modeling of the Background Difference Method in Basketball

4.1. Shooting Characteristics

To realize the detection of basketball during the shooting process, it is first necessary to understand some characteristics of the basketball during the shooting process and some possible situations. First, introduce the rules of the test. By understanding the rules of the test, the various postures of the basketball in the shot are basically determined. Finally, through the characteristics of shooting, determine the installation position of the camera to obtain the image, as shown in Figure 1.

The capture position of the image is very important, and the proper installation position of the camera directly affects the accuracy of the detection. According to several characteristics of basketball during the shooting detection process, it is determined that the installation position of the camera is set above the basket. We also need to introduce the basketball stand and its corresponding settings as shown in Figure 2.

4.2. Algorithm Completion

Algorithm for locating the basketball in the system is developed by analyzing data from a shooting session, including camera position, ball properties, and background difference approach previously discussed. The diagram of the algorithm is shown in Figure 3.

For the background difference method to work, a reference model for the moving area must be created beforehand. The image can be distinguished once the backdrop reference model has been established. The mean value approach is utilized in this subject because the scene environment is quite simple and the influence of other objective factors is relatively modest. In order to create the background reference model, it is necessary to first obtain the IMG images from the video, which are then combined and averaged as illustrated in Figure 4.

Figure 5 illustrates how the reference model can be used to distinguish between moving and stationary pictures in a video, as well as separating the moving area from the background. Differencing pictures are shown in Figure 5.

After the image is differentiated, although some parameters of the image can be extracted, the method is more troublesome, so further processing of the image is required. Make it easier and faster to find and use basketball information. The setting of thresholds is critical while binarizing a picture in order to efficiently extract the properties of basketball. In Section 1, the selection method of thresholds has been introduced. Because the maximum variance method between classes is an adaptive threshold selection method, therefore, in the binarization, the maximum interclass difference method is used to extract the threshold value in the grayscale difference image, as shown in Figure 6:

This section first conducts in-depth research on the basic idea, basic process, background modeling, and threshold selection of the background difference method. Then, according to the characteristics of the basketball movement during the fixed-point shooting, determine the installation position of the camera. Finally, the background difference method is combined with the characteristics of fixed-point shooting, and the shooting detection algorithm is designed and implemented.

Industrial cameras and computers are among the most important pieces of equipment in the system’s hardware environment. Image frames are captured by the camera for use in the automatic identification system. Automated shot identification relies heavily on the computer as a platform. Basketball moves extremely quickly in fixed-point shooting, and it is possible that an instantaneous acceleration occurs in the basket. If a normal camera is used, the frame rate is too low, and it will form a smear on the captured image. In order to prevent the existence of smear in the acquired images, a high frame rate industrial camera is used in the selection of the camera. The specific parameters of the camera are shown in Table 1.

In recent years, with the continuous development and improvement of computer vision technology, the software functions that can process images have become more and more powerful. Among them, MATLAB has powerful matrix computing capabilities, which is very conducive to the processing of digital images, and it also provides many image processing functions. Therefore, the development of this system is the development of an automatic identification system for basketball shots using MATLAB software. Construction of the operating environment is carried out since the system is developed in the MATLAB environment; it should be run in the calculation without the Matlab environment. Then, the operating environment of the computer is software environment: window 7 (32 bit operating system); install this plug-in MCRInstaller.exe or install MATLAB software and video decoder, and the results are shown in Figure 7 and Figure 8.

4.3. Analyze of Results

Thirty pieces of data are used to verify and examine the system’s correctness. The results of the trials are shown in Table 2. The detection accuracy of a system is measured primarily through a study of three features: the false detection rate, the missed detection rate, and accuracy in the test system. Table 2 and Figure 9 show that the system’s accuracy can reach 90%, which is more than enough to satisfy the requirements. The most common causes of missed and erroneous detections are the same as in Test One and Test Two. System judgment circumstances and the judgment procedure have been adjusted to ensure that the system’s accuracy can achieve 100% in the follow-up work.

Real-time performance means that when an event occurs, it can be analyzed and processed correctly in time as Figure 10, but the processing time may be long or short. The shorter the processing time is to define the real-time performance, the longer the processing time or the slower the response is to define the worse the real-time performance. In video processing, the real-time nature of the system means that when an event in the video occurs, the time for the system to respond to the event in a timely manner should be as short as possible. Due to the need for video processing and analysis, it is generally defined that the speed of video processing data is 10–1_5 frames/sec. It can be considered that the system is close to a real-time system.

System stability refers to the ability of the system to overcome interference, deal with emergencies, and recover from errors. Generally, evaluating the stability of a system should allow the system to run in different environments for a long time. The stability of the system is tested by running the software for a long time. Due to the influence of some objective factors, the system can temporarily work for 8–10 hours. Through the three performance evaluations of the system, the system not only has a good processing effect but also can fully meet the real-time requirements of processing video.

This section mainly introduces the realization of the automatic test system for fixed-point shooting. First, it introduces the construction of the system environment, and then, through the analysis of the system structure framework, the system is divided into six modules (image capture and preprocessing module, parameter configuration module, detection module, postprocessing, judgment, and graphical interface), which introduce the functions completed by each module in detail. Finally, the overall performance of the system is comprehensively evaluated through the analysis and comparison of the recorded video detection results, and the performance comparison with the existing detection equipment is carried out. The system is basically consistent with the existing test system in real-time and accuracy, which occupies a great advantage instability and achieves the ultimate goal of the research of this subject.

5. Conclusion

In this thesis, the idea and realization method of the moving target identification and tracking algorithm in the video picture sequence are presented in-depth, which employs digital image processing and recognition technology. The experimental data and results are provided for the algorithm described in this study, which was built using MATLAB7.0. The following are the study’s key research findings. The focus is on the moving target extraction approach, which combines the frame difference method with the background difference method under a fixed background, based on an in-depth investigation of the existing generally used target recognition algorithms. This method combines the benefits of the two methods, making the detection algorithm more resistant to environmental changes and allowing for precise extraction of moving targets with entire contours. When comparing the matching tracking algorithm based on regional features to the target location prediction tracking algorithm in the area of moving target tracking, it is determined that the target tracking algorithm based on location prediction has a faster tracking speed. To realize real-time prediction and tracking of the moving target and collect the coordinate data points of the moving target, the second-order polynomial approach is employed to suit the motion trajectory. To derive the target’s motion trajectory, the cubic spline function is employed to fit the data points. Simultaneously, a video image sequence sampling rule is suggested, which increases the accuracy of target motion speed extraction while making fair use of large video stream data. Although the system in this study can extract moving targets properly and completely, it also achieves quick target tracking. However, due to a lack of resources, the research on this topic is currently limited. The next research work is mainly focused on the following aspects. The target detection algorithm in this study extracts relatively small internal moving targets when the background and foreground moving target colors and textures are quite close. Large empty areas, sometimes even unable to extract moving targets, are the study of removing shadow algorithms from the geometric characteristics of the scene and the target and the influence of the target’s own deformation and occlusion in the tracking process needs to be further studied.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.