Abstract

Object tracking is a hot issue in vision technology research; it is used in scenarios like intelligent monitoring, autonomous driving, and robot visual perception. With the rapid development of sports, tracking of targets in complex sports scenes represented by basketball and football has gradually attracted attention. The target tracking algorithm based on machine learning (ML) has been gradually proposed. With the powerful feature extraction for convolutional neural network (CNN), it greatly improves accuracy and has better robustness in the face of complex sports scenes. However, the tracking algorithm based on deep learning has many network layers and parameters, which makes training and update speed of model slower. In this regard, taking basketball as an example, this paper designs a low-parameter deep learning-based complex sports target tracking algorithm, which greatly reduces the size of the model while ensuring the tracking accuracy. Aiming at the problem of large number of parameters and large model of deep learning tracking algorithm, this work proposes a network structure model with asymmetric convolution module. The asymmetric convolution module includes two convolutional layers, the compression layer and the asymmetric layer. To improve accuracy, this work designs a new triplet loss. Compared with original logistic loss, triplet loss function can fully utilize the latent relationship between the inputs, so that the network model can obtain higher tracking accuracy. Finally, this paper proposes a low-parameter deep learning-based target tracking algorithm combining asymmetric convolution and triple loss function. Comprehensive and systematic experiments demonstrate the effectiveness of this work in tracking complex sports objects.

1. Introduction

The sports industry, which has broad market prospects, has gradually become an experimental field for artificial intelligence technology. With increase in quantity and quality of available sports data, artificial intelligence has brought many changes to the traditional sports industry. For example, training assistance software developed based on vision technology has effectively improved the training level of athletes. The automatic statistics system of sports data based on visual technology saves a lot of manual labeling costs, and the eagle-eye technology that assists the referee to judge whether the ball is out of bounds or whether a goal is scored improves the fairness of the game judgment. The live broadcast of sports events with various intelligent and display effects, wearable and virtual reality sports products that have emerged in recent years, etc. have provided more and more convenience for people with different needs in the sports industry chain [15].

Among the many sports application fields based on artificial intelligence technology, sports video analysis is one of the more popular branches. The main reason is that sports videos have a wider audience and huge market potential. In addition to traditional TV users and sports enthusiasts, sports programs are also favored by emerging media users such as mobile terminals and online live broadcasts, as well as by some sports professionals. With the rapid development of mobile devices and the Internet, people’s demand for sports videos has also shifted from simple viewing to diversified needs. From the perspective of academic research, realizing the abovementioned diversified user needs requires the comprehensive use of many technologies in various fields such as visual technology, multimedia, and ML. Research in sports video analysis will also promote the further development of these technologies. At present, many research institutes at home and abroad have carried out research work on sports video analysis, and academic exchanges on sports video analysis are also very active. The latest research results on sports video analysis are published every year in various academic journals and conferences related to multimedia, artificial intelligence, computer graphics, etc., at home and abroad [610].

An important basic research content in sports video analysis is the tracking of moving objects. Sports objects mainly refer to objects with specific semantics in sports videos, such as balls and players. The main purpose of tracking balls and players is to determine their positions and finally get their motion trajectories. This work is not only helpful for applications such as tracking of specific players, statistics of players’ running data, reconstruction of 3D virtual scenes, and auxiliary referee judgments. It is also the basic work for advanced semantic tasks such as action recognition, event detection, highlight video editing, technical and tactical analysis, and player and team performance scoring. Therefore, the tracking of moving objects has become a hot issue in sports video analysis. Compared with ordinary surveillance videos, the tracking of moving objects in sports videos is more challenging. For example, the small size of the ball, serious occlusion, background and lighting interference, etc. cause the tracking drift problem of the ball. Tracking identity exchange issues are caused by mutual occlusion of multiple players and similar-looking interference. Motion blur is caused by the high-speed movement of the ball and players, with the problem that the motion model is difficult to establish [1115].

This paper introduces the different methods to promote and track the target system. With the background of the new era in which China is striving from a sports power to a sports power, sports video analysis has a wide range of application backgrounds and market demands. The asymmetric layer is used for feature fusion, which greatly compresses the size of the model while ensuring accuracy. At the same time, optimization strategies such as momentum gradient descent algorithm and batch normalization are used in the model, so that the model has a better training effect. The asymmetric layer is used for feature fusion, which greatly compresses the size of the model while ensuring accuracy. At the same time, optimization strategies such as momentum gradient descent algorithm and batch normalization are used in the model, so that the model has a better training effect. The target tracking problem, as the basic work of sports video analysis, faces many technical difficulties and has important academic significance and application value. The research on target tracking methods in complex sports scenes in this paper is expected not only to lay a foundation for the automation and intelligence of sports video analysis, but also to promote the further development of target tracking related methods and key technologies.

The sequence of the paper is as follows: Chapter 1 introduces the paper. Chapter 2 elaborates the related work of paper. Chapter 3 explains all the methods used in paper. Chapter 4 describes the preforming experiment to modify the results and discussed it completely. Chapter 5 gives conclusion of paper.

With rapid development of neural networks, discriminative methods have set off a research upsurge in the field of target tracking. MDNet [16] trains a lightweight network by means of multidomain learning, which only focuses on domain-related information from a specific domain, enhancing the discriminative ability of the network. FCN [17] proposes a feature map selection mechanism to remove noisy and irrelevant feature maps, thereby reducing the computational burden and improving the running speed of the algorithm. Based on MDNet, RT-MDNet [18] improves the regional feature aggregation method, accelerates the speed of feature extraction, and optimizes the loss function by adding constraints to enhance the model's ability to distinguish similar objects. RTT [19] uses a multidirectional regression neural network to capture the target structure, which can obtain the contextual cues required for target tracking from multiple directions, effectively improving the accuracy of target tracking in occlusion scenarios. The above trackers follow the classification and update strategy. However, the online update of such trackers usually requires a large amount of computation.

This problem has prompted the emergence of another class of deep learning-based discriminative tracking strategies, which treat this problem as a similarity learning problem. Learn a similarity matching function by training on a large-scale dataset. And use this function to calculate the similarity. Such trackers often do not need to update network parameters, which alleviates the contradiction between tracking efficiency and tracking accuracy to a certain extent. SINT [20] was the first to use deep Siamese neural network for target tracking and used image patch similarity learning to solve the target tracking problem. Siam FC [21] is a typical similarity learning-based tracker. The algorithm can learn a similarity matching function through offline training of deep Siamese neural network and calculate the similarity between the target and candidate objects through full convolution operation to achieve tracking. And it allows template image and search image to have different sizes. SiamRPN [22] uses a region proposal network for visual tracking, using the RPN module [23] to generate multiscale candidates. This method converts similarity calculation into two problems of regression and classification, which effectively improves the accuracy of tracking results. GradNet [24] proposed to use the forward propagation and backpropagation of the network to learn the gradient information and use the gradient information to guide the update of the template, which effectively improves the accuracy of the tracker in scenarios such as background interference and severe deformation. DaSiamRPN [25] expands the positive samples and difficult negative samples in the training phase on the basis of SiamRPN, which effectively improves the interference perception ability of the tracker. FF-Siam [26] proposed a feature fusion-based Siamese network tracking framework, which can fuse different features according to the weight generation layer to improve the network's adaptability to target changes. CLNet [27] proposes a compact latent network. The network can obtain target-specific information from decisive samples in the video sequence and quickly adjust the tracking model to adapt to new scenarios based on this information. SCSiamRPN [28] proposes a strongly coupled Siamese region recommendation network framework based on joint optimization. By improving the classification loss, the joint optimization is realized. PGNet [29] proposes to reduce the matching area by calculating similarity between each pixel on search image feature as well as global template feature, thereby avoiding introducing more background interference. The above similarity learning-based trackers perform offline training of deep Siamese neural networks on large-scale datasets, which usually do not require online updates during tracking, so real-time tracking speeds can often be obtained.

3. Method

In this section, the total explanation of method used in paper is given. Firstly, track the target carefully and set the goal of research. Additionally, explain the target tracking algorithms and make model design to easily track target. Then, track target with improved triplet loss and lastly, use the low-parameter network to track basketball target. All the mentioned targets are beneficial to choose target easily and correctly.

3.1. Target Tracking Based on Asymmetric Convolution Module

The target tracking algorithm based on deep learning has many parameters and much calculation, which affects training speed of model and is difficult to deploy on embedded devices with limited memory. This work designs a tracking model based on an asymmetric convolution module (ACM), by using asymmetric convolution to compress the size of model and reduce parameters.

3.1.1. Target Tracking Algorithm Design

To compress the model size, this work designs a target tracking algorithm with an asymmetric convolution module as illustrated in Figure 1.

The tracking process based on the asymmetric convolution module mainly includes two: model training and target tracking prediction output. The training part of the model starts with selecting the training dataset, followed by building the tracking model. The design of the tracking model is generally carried out around different CNNs or different convolutional modules. After the model is built, it is necessary to set network parameters and initialize weight values. The model is then trained through an optimization algorithm. When the loss does not converge, model enters backpropagation stage. The parameter and weight values are updated by inverse derivation of the error until the loss function converges. At this time, you can end the training and save the trained tracking model.

When using the trained tracking model for tracking, the tracking model first initializes the position of the tracking target through the given candidate region. Then many candidate regions are generated, and feature extraction is performed on these candidate regions through a CNN. Then compare the extracted features with the features of the target location, and output the candidate region corresponding to the feature with the largest similarity value. This candidate region is where the target finally predicted by the tracking model is located.

3.1.2. Tracking Model Design

To compress size of model, this work designs an object tracking model based on an asymmetric convolution module. By using an asymmetric convolution module, the amount of model parameters is reduced, and the tracking accuracy of the model is guaranteed not to be affected. The structure diagram is illustrated in Figure 2.

The entire tracking model contains two CNN branches based on asymmetric convolutional modules, and the structures and parameters of these two CNN branches are the same. Each branch has its own input, one of which is the template image, which is the target image to be tracked. Another input is the search image, which is the candidate image for the next frame. The two input images go through the corresponding CNN branches, respectively, to perform the same feature extraction operation. Then, a convolution embedding function is used to compare the similarity of the output features of the two branches to obtain a feature score map.

The training process of the model is to adjust the parameter weights through continuous forward comparison and backpropagation, so as to obtain an optimal similarity comparison function for evaluating whether the template image and the search image are the same target. If the two images describe the same object, a high score is returned in the feature score map; otherwise a low score is returned. Therefore, the similarity comparison function can be defined aswhere is distance measure or similarity measure, and is a feature extractor, that is, a CNN based on asymmetric convolution modules.

In the training process of the model, in order to be able to input a large search image, the similarity of all candidate regions and the target image can be calculated at one time when comparing the similarity. This paper chooses to use a fully CNN.

In the CNN structure, after multiple convolution layers and pooling layers, the fully connected layers are connected, and the number of fully connected layers can be one or more. The fully connected layer connects all the neurons in the previous layer with the neurons in this layer, integrates the local information in the convolution layer and the pooling layer, and finally outputs a one-dimensional feature vector for classification. The fully CNN uses a convolutional layer instead of a fully connected layer, and the output is a labeled image.

The fully CNN converts the sixth, seventh, and eighth fully connected layers of the classic CNN into three convolutional layers, and the output is a marked feature heat map. This paper uses a fully CNN for feature extraction, so that the search image does not need to have the same size as the template image, which can provide the network with a larger search image as input and can also analyze the possible positions in the entire input image. Compared with the target position, the similarity of all possible positions in the input image is calculated in one go. Get a feature score map. The location with the highest score corresponds to the location of the target to be tracked in the search area.

3.1.3. Asymmetric Convolution Module Design

At present, researchers focus on improving the accuracy and speed of deep learning target tracking algorithms and do not pay attention to the size of the model. But for some given level of accuracy, a small model target tracking algorithm with low parameters has more advantages while achieving the accuracy. First small models require less communication and can be trained more quickly. Second, smaller models transmit less data and can be updated more frequently. In addition, small models are more conducive to deployment on specific memory-constrained hardware devices. Therefore, this paper designs an asymmetric convolution module to reduce the number of parameters of the model. The structure of the asymmetric convolution module is shown in Figure 3.

As shown in the figure, the asymmetric convolution module includes two convolutional layers, namely, the compression layer and the asymmetric layer. The compression layer only contains a convolution kernel, and the asymmetric layer contains three convolution kernels of , and . Compared with AlexNet, the asymmetric convolution module in this paper uses a convolution kernel to replace the 3 × 3 convolution kernel in the compression layer, which can reduce the parameters by 9 times. At the same time, the number of convolution kernels in the compression layer is set to be less than the number of convolution kernels in the asymmetric layer. This reduces the number of channels input to the 3 × 3 convolution kernel, which also reduces the size of the model.

When the rank of a 2D convolution kernel is 1, the operation can be equivalently transformed into a series of 1D convolutions. Therefore, in the asymmetric layer, the asymmetric convolution kernel can approximate the feature extraction effect of the square convolution kernel for model acceleration and compression.

In the convolution operation, the convolution is additive. And when the feature output of the same resolution is generated, we can fuse these features with the features extracted by the 3 × 3 convolution kernel on the channel to obtain an equivalent feature output. The formula is as follows:where is a matrix, and and are two convolution kernels of size that can be embedded. Therefore, asymmetric convolution has little effect on the tracking accuracy of the model while reducing the amount of model parameters.

3.1.4. Model Training and Optimization

On the one hand, due to the large number of parameters in the CNN, gradient explosion or gradient disappearance may occur during the training process of the model, resulting in failure to train. Or there are problems such as overfitting and falling into local optimum, so that the model cannot achieve the expected training effect. On the other hand, the training of CNNs often requires a large amount of training data, and the training time of the model will also become longer. Therefore, in order to efficiently train the model, achieve a good training effect, and avoid problems such as overfitting and gradient disappearance, this paper uses some optimization methods for training.

In order to enhance the learning ability of the CNN so that it can learn complex things and complex mapping relationships between input and output, this paper adds a nonlinear activation function ReLU function to all convolutional layers:

Although the current mainstream activation functions include the Sigmoid function, the Tanh function, and the ReLU function, the ReLU function has several advantages over the Sigmoid function and the Tanh function. The first is that when the input is positive, the ReLU function does not have the problem of gradient saturation. In addition, the ReLU function has only a linear relationship, and its speed is much faster than the Sigmoid function and the Tanh function, whether it is forward propagation or backpropagation. And when the ReLU function is backpropagating, it will not use the divisor when calculating the error gradient like the Sigmoid function and the Tanh function, so the problem of gradient disappearance occurs. Therefore, the activation function chosen in this paper is the ReLU activation function.

In the field of ML, there is a very important assumption: the training data and the test data have the same distribution, which can ensure that the model trained on the training data can also perform well in the test data. But for deep neural networks, the parameters are updated every iteration. As a result, in the process of training, the input distribution of each hidden layer changes, making training more and more difficult and the speed of convergence slower and slower. And it is also prone to problems such as gradient disappearance or gradient explosion. In order to solve this problem, this paper adds a batch normalization layer after the asymmetric convolutional network. By using a batch normalization layer to normalize the input of each layer, the mean and variance of the input information of each layer are fixed.

In order to solve the problem of local oscillation in the stochastic gradient descent algorithm, this paper chooses to use the momentum gradient descent method. Different from the stochastic gradient descent method, the momentum gradient descent method introduces momentum, that is, the idea of inertia. The previous update direction will affect the current update direction, and the final update direction will also be adjusted by the current gradient. By introducing momentum, the stability can be increased to a certain extent, the convergence speed can be accelerated, and there is a certain ability to jump out of the local optimum. The iterative update formula of the momentum gradient descent algorithm iswhere is the network parameter, is a hyperparameter, and is learning rate.

3.2. Target Tracking with Improved Triplet Loss

In the process of tracking, factors such as object movement, occlusion of the target, and complex background will affect the performance of the tracker. Especially when the traditional target tracking algorithm faces these situations, the performance of the tracker will be greatly affected. Although the current tracking algorithm based on correlation filtering and the correlation algorithm based on deep learning have better tracking accuracy, the robustness in dealing with complex scenarios is also improved. However, these interference factors have not been completely shielded, and how to maintain the tracking accuracy of the model in complex scenes is also the focus and difficulty that scholars have been studying. In order to make more use of the information in the input image, the tracking accuracy of the model in complex scenes can be improved. The original work uses an improved triple loss (ITL) to train the model, making more full use of the relationship between the inputs and improving the accuracy of the tracking algorithm.

3.2.1. Triplet Loss

In the Siam FC algorithm, the logistic loss function is used for training, and its expression iswhere , , and are the label set, similarity score set, and instance input set.

In the Siam FC algorithm, a balanced weight is used for the loss according to the number of positive and negative instances. Balance weights are defined as follows:where is the number of positive samples, and is the number of negative samples.

Although the Siam FC algorithm utilizes the powerful feature extraction capabilities of CNNs, it does not fully utilize the relationship between positive and negative instance pairs in the input samples. In the Siam FC algorithm, the task of object tracking is defined as similarity learning in the embedding space, and a similarity function is learned through a Siamese network. The input consists of a template image containing the object and a larger search image. The Siam FC algorithm regards the bounding box of the candidate object as an instance and then divides the positive and negative instances according to the distance between the real position of the object and the candidate position. When the distance is less than the threshold, it is marked as a positive instance, and when the distance is greater than the threshold, it is marked as a negative instance. Logistic loss is applied to maximize the similarity score of positive instance pairs and minimize the similarity score of negative instance pairs, while this method can only exploit the pairwise relationship between samples. Compared to the logistic loss function, the input of the triplet loss is an anchor, a positive instance, and a negative instance:where is Euclidean distance, and is distance margin.

When using triplet loss, positive instances of the same class can be brought closer to the anchor, while negative examples of different classes are farther away from the anchor.

3.2.2. Improved Triplet Loss for Target Tracking

To take full advantage of the potential connections between exemplars, positive instances, and negative instances, this chapter proposes a new triplet loss. Train through a triplet loss function to exploit the intrinsic connections between the inputs as much as possible. And in most cases, more training elements are included to improve the tracking accuracy of the model.

Anchor images are unique, but the number of positive and negative images is usually greater than 1. Because the distance between the target object and the instance is greater than a certain threshold, it can be considered as a positive instance image, and there are often many candidates region images that meet this condition.

The similarity score set s of the anchor-instance pair is also split into a positive instance pair score set and a negative instance pair score set . Then define the triplet loss on and . Positive instances are matched as exemplars using the softmax function. The formula for matching probability is defined as follows:

By using negative logarithms to maximize the probability between positive and negative instance pairs, the loss can be formulated as follows:

Compared with the logistic loss function, the triplet loss function proposed in this paper contains more training elements. In Siam FC, the logistic loss function can be trained with at most pairs of positive examples and pairs of negative examples. But our triplet loss can generate triplet elements, that is, the combination of P positive example pairs and N negative examples. When the number of positive and negative example pairs is greater than 2, the triplet loss function has more training elements than the logistic loss function, and the larger the number of positive and negative example pairs, the larger the gap. Especially in some complex scenarios, if more input elements can be used, the tracking accuracy of the algorithm will definitely be improved.

3.3. Low-Parameter Network for Basketball Target Tracking

In order to reduce the amount of model parameters and improve the tracking accuracy of the algorithm, this paper combines asymmetric convolution and triple loss function and proposes a low-parameter deep learning-based basketball target tracking algorithm.

3.3.1. Overall Framework

When the asymmetric convolution module is used, the parameter quantity of the model can be reduced and the tracking speed of the model can be improved, but the tracking accuracy of the model cannot be improved. When using the triple loss function for training, the tracking accuracy of the algorithm can be effectively improved, but it has no effect on the reduction of the model size. Therefore, this paper proposes a low-parameter deep learning-based target tracking algorithm. Based on the target tracking algorithm based on the asymmetric convolution module, the model is trained with a triplet loss function instead of the logistic loss function. While reducing the amount of model parameters, the tracking accuracy of the algorithm is improved. This model is called ACM-ITL in this work. The overall frame diagram is illustrated in Figure 4.

As shown in the figure, the algorithm in this paper adds a triple loss function to the target tracking algorithm based on the asymmetric convolution module for training.

3.3.2. Detailed Network Structure

The network structure of the tracking model in this paper is shown in Figure 5. The entire network structure is stacked using asymmetric convolutional modules. The input image first goes through a convolutional layer and then a pooling layer, which is downsampled by the pooling layer. The pooling layer is followed by three stacked asymmetric convolution modules, which extract features and reduce the parameters of the model through asymmetric convolution modules. Then, downsampling is performed again through the pooling layer, and feature extraction is performed again through three asymmetric convolution modules. Finally, the outputs of the two branches are convolved to obtain the final feature score map, and finally the triplet loss function is used for training as shown in Figure 5.

Among them, the functions of the pooling layer mainly include feature dimension reduction, compress the number of data and parameters, reduce overfitting, and improve the fault tolerance of the model. The pooling operation is mainly divided into two types: average pooling and maximum pooling. The average pooling is to average the values in the corresponding sliding window in the convolution feature map, and the average pooling can retain more background information of the image. The maximum pooling is to find the maximum value of the corresponding sliding window in the convolution feature map, and the maximum pooling can retain more texture information of the image. The pooling operation reduces features and parameters while maintaining invariance such as rotation, translation, and scaling. The model in this paper chooses to use max pooling.

4. Experiment and Discussion

In this section, complete procedure of performing a successful experiment is briefly discussed and followed. Firstly, collect the data and after this, evaluate data by comparing with different methods. Networking techniques and ITL are evaluated for basketball training. Hence, evaluate data to optimize the results.

4.1. Dataset and Evaluation Metrics

This work uses two self-made datasets (BTA and BTB) for experiments. The datasets are collected from basketball game videos. Each dataset contains different training samples and test samples. The specific data distribution is shown in Table 1. The evaluation metrics include precision and success rate.

4.2. Evaluation on Network Training

In ML, the convergence of deep networks is an important indicator. In order to verify that the basketball target tracking model designed in this work can converge, the loss during network training is analyzed. The experimental results are shown in Figure 6.

It is obvious that the loss gradually decreases as the network training progresses. However, when the epoch reaches 30, the network loss decreases very little, which means that the network has reached a state of convergence at this time. This also verifies the correctness of the deep network designed in this work.

4.3. Comparison with Other Methods

To further verify the effectiveness of the network proposed in this work, the method in this paper is compared with other object tracking methods. The designed comparison algorithms include KCF [30], SAMF [31], DAT [32], and Siam FC [21], and the results are illustrated in Table 2.

Obviously, compared with other target tracking algorithms, the method designed in this paper can achieve the best performance improvement. This further proves the validity and correctness of the method designed in this work.

4.4. Evaluation on Optimization Strategy

As mentioned earlier, this work uses different optimization strategies for network training, they are ReLU, BN layer, and momentum gradient descent (MGD). To verify the effectiveness of these strategies, different comparative experiments are carried out in this work.

First, to verify the effectiveness of the ReLU activation function, the network performance when using ReLU is compared with the network performance when using Sigmoid. The experimental results are illustrated in Figure 7.

It is obvious that the basketball target tracking network achieves the best performance when using ReLU, thus demonstrating the effectiveness of using ReLU. To verify that the use of the BN layer can improve the target tracking performance of the network, the performance without BN is compared with the performance when BN is used. The experimental results are illustrated in Figure 8.

It is obvious that the basketball target tracking network achieves the best performance when using BN, thus demonstrating the effectiveness of using BN.

To verify that the use of the MGD can improve the target tracking performance of the network, the performance without MGD is compared with the performance when MGD is used. The experimental results are illustrated in Figure 9.

It is obvious that the basketball target tracking network achieves the best performance when using MGD, thus demonstrating the effectiveness of using MGD.

4.5. Evaluation on ITL

This work uses ITL to train the network, and to verify the effectiveness of this loss, a comparative experiment is conducted in this section. The network performance when using the original loss and the network performance when using ITL are compared, respectively, and the experimental results are illustrated in Table 3.

It is obvious that the basketball target tracking network achieves the best performance when using ITL, thus demonstrating the effectiveness of using ITL. It works better and effectively. Performance of people in basketball is more improved and result is in the form of fabulous performance.

5. Conclusion

Due to the rapid development of sports, target tracking in complex sports scenes represented by basketball has gradually attracted attention. Target tracking algorithms based on ML are widely used, which greatly improves the accuracy of target tracking. However, there are more network parameters, and the training speed of the model is slower. In this regard, this paper takes basketball as an example and designs a complex moving target tracking algorithm based on low-parameter deep learning, which reduces the network parameters while ensuring the tracking accuracy. The innovations and research contributions of this paper are as follows: (1) Aiming at the problem of large deep learning tracking algorithm model, this paper proposes a network structure model based on asymmetric convolution modules. The asymmetric convolution module includes two convolutional layers, the compression layer and the asymmetric layer. The compression layer is used to reduce the number of channels between convolution kernels and reduce parameters of the model. (2) To further improve the tracking accuracy of the model, this paper proposes a triple loss function to train the model. Compared with the original logistic loss function, the triplet loss function can fully input the latent relationship between them, so that the network model can obtain higher tracking accuracy. (3) Combining the asymmetric convolution module and triple loss function, a low-parameter deep learning-based target tracking algorithm is proposed. While reducing the amount of model parameters, the tracking accuracy and success rate of the model are improved.

Data Availability

The datasets used during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work has been supported by the Department of Education of Anhui Province (Teaching and research project of Anhui Provincial Department of Education in 2018, Project no. 2018jyxm0118).