Abstract
In order to better meet the training needs of sports and improve the standardization of sports training, an openpose-based sports posture estimation method and assisted training system are proposed, combining the basic structure and principle of openpose network. Firstly, the human posture estimation algorithm is constructed by combining with the openpose network; secondly, the overall framework, specific operation process, image acquisition, posture estimation, and other modules of the sports assistance system are designed in detail; finally, the openpose posture estimation method constructed above is validated. The results show that the value of the loss function obtained by the algorithm gradually stabilizes after 250 iterations. By using the COCO dataset as the training base and comparing it with the standard posture, it is found that the algorithm can correctly identify different badminton action postures, and the recognition rate can reach up to 94%. This shows that the algorithm is feasible and can be used for posture estimation and training of badminton sports movements.
1. Related Work
With the development of people’s livelihood, ordinary people pay more and more attention to personal health, and the discussion about physical health and sports in the society is becoming more and more heated. However, most people do not master the standard motion posture, so that the best motion effect cannot be obtained, and even suffer unnecessary injuries during exercise. Therefore, it is necessary to process human movement recognition. In the past, people rely on assistant equipment to recognize human posture, so as to judge whether human movement is standard. With the mature of the machine learning algorithm and deep learning algorithm, researchers proposed diversified human movement recognition algorithms, including SVM classifiers, image processing, and deep neural networks. Furthermore, in human motion recognition technology, researchers also pioneered human body posture estimation technology, human motion recognition technology, and so on.
Human posture estimation has always been a popular research topic in academic research. For example, Amir Nadeem created the A-HPE method. There are four benchmark data sets, namely, significant profile detection, entropy Markov model, multidimensional cues from whole body profile, and robust body part model, used to detect human body parts. Its detection accuracy is significantly higher than that of traditional algorithms [1]. In addition, it can provide technical support for human-computer interaction. Poojitha Sing obtained data of various components of each parts of the human body by measuring the point cloud data of human posture in RGB images, which avoids the ambiguity of features and thus shows better detection performance [2]. Xinwei Li estimated the human joint moment by analyzing the dynamic human-computer interaction between the human elbow torque and the exoskeleton output [3]. Wei Quan et al. created an unsupervised learning algorithm based on a forward kinematics model of human skeleton, but the algorithm has not been tested. After the establishment of the human posture estimation algorithm, it is also necessary to introduce the integrated particle swarm optimization (PSO) for optimization. The advantage of the optimized algorithm is that no pretraining data is required, and the posture estimation of the human body is more concise [4]. After that, this method is tested by a series of experiments. Many scholars have studied human motion recognition. For example, Xiaojun Zhang created a human motion recognition technology based on deep learning. The LSTM algorithm is used to optimize deep learning algorithms, which requires advanced smart wearables devices [5]. Bi Zhuo created a multimodal deep neural network model based on the joint cost function, which used MSR Action3D data sets to identify human motion processes. And, the overall application performance is excellent [6]. Liu Shuqin proposed a human posture estimation method based on discrete point 3D reconstruction algorithm. In this method, the data features are extracted using principal component analysis, and then the estimation of human posture is achieved by means of two-dimensional posture prediction [7]. Jalal Ahmad et al. proposed a 3D Cartesian approach to feature extraction, by which the features are made to contain rich feature information [8]. Licciardo Gian Domenico and others then proposed a posture estimation method of FCN, and the results showed that the method obtained an average accuracy of 96.77% for 17 posture recognition [9]. Combined with the above research, the purpose of this study is to build an auxiliary system that can be used for badminton training and try to realize the estimation of posture through the matching of key bone points of human body, so as to better assist the movement training of badminton lovers. The contribution of this study lies in the extraction of sports posture through in-depth learning, and then through similarity comparison, the standardization standard of sports action is constructed, which provides more accurate information reference for sports training.
2. Estimation of Human Key Bone Points Based on the Openpose Model
2.1. Basic Structure
Openpose model uses the multistage convolutional neural network for training and testing. The first 10 layers of VGGNet-19 are used to initialize the human body image and then fine-tune the initialized human body image. Finally, to input, a set of human body characteristics map F can be achieved. The predicted values for subsequent stages are related to their corresponding image features. Using the three consecutive 3×3 kernels to replace the 7×7 convolution kernel in the earlier output PAFs, which can not only ensure the receiving fields but also greatly reduce the amount of computation, so as to effectively improve the work efficiency of the network model. By referring to the DenseNet method, each output in the three convolution kernels is cascaded together. The network model can synchronously save high-level features and low-level features.
The network structure of the openpose model is shown in Figure 1.

In the first stage, the convolutional neural network generates a set of partial affinity field. In the subsequent stages, the prediction results of the previous stage are cascaded with the original graph feature F. So, more accurate prediction can be made [10]. represents the convolutional neural network at stage t ≤, and represents the total PAF prediction stage.
After iterations, take the latest PAF prediction stage as the first stage and repeat this process to predict the confidence map [11]. Here, represents the convolutional neural network at stage and , and represents the total prediction stage of the confidence map.
2.2. Estimation of Human Key Bone Points Based on Optimization Model Structure
The openpose model is created to recognize and estimate multihuman postures. Its innovation is reflected in three aspects: firstly, the human body vector inclined field PAFs is established to estimate the confidence map of human limb features, which is among the constrained bone points in the human pose model. The constraint relationship is strengthened by combining the human key points hot spot map. And, the classification of multihuman posture key points is simplified. Secondly, six stage layers are created. The next stage layer will re-estimate the human key point hot spot map and the confidence map of human limb features, which output from the previous stage layer. And, the estimation accuracy can be further improved. Thirdly, during the training process, the loss function of each stage is monitored to ensure that the overall loss is minimized. According to the test results, the openpose model has the advantages of high estimation accuracy, but also has the disadvantages of long estimation time.
For badminton, athletes’ postures change quickly. The action frequency is higher than that of human body under normal circumstances. In order to track the change process of motion posture in real time, it is necessary to ensure the efficient operation of the estimation module and evaluation module in the human body posture evaluation system. Consider that the openpose model evaluates the head region according to the five bone points in the head, which have little impact on the body posture of badminton players. And, the estimated time is indirectly extended [12]. Therefore, taking the modified human posture model as reference, this paper created a new deep neural network model to estimate the two-dimensional coordinates of human skeleton points of badminton players in a single frame image. Its architecture is shown in Figure 2 [13].

Firstly, the VGG neural network is introduced into the improved human posture evaluation model, and then the evaluation of human posture is realized through two-stage processing of openpose. The basis of the VGG neural network in Figure 2 is the CNN. Convolutional neural network is used to extract image features. CNN network includes convolutional layer, pooling layer, full connected layer, and output layer.
The convolution operation iswhere represents the convolution kernel; represents the number of layers; represents the th feature graph; and represents the bias term.
The calculation of the pooling layer iswhere represents the lower sampling function and and represent the feature graph corresponding to each output, respectively.
The training process of CNN includes forward propagation and back propagation.
Among them, forward propagation can calculate corresponding actual output results after layer by layer transformation by inputting data to CNN, and the calculation formula of this process is as follows:
Back propagation is the calculation of the error between the actual output and the target output , and then the error is back-propagated according to the principle of error minimization, and its weight is constantly adjusted.
The training process of CNN is as follows:
2.2.1. Back-Propagation Algorithm
In the process of forward propagation, the squared error cost function is used to measure the error. If the category is class and the number of training samples is , then can be expressed as
In formula (4), and represent the target output of th sample and the -dimension of actual output, respectively.
In the process of back propagation, the sensitivity of base is used to represent the error of back propagation, which represents the change rate of error to the base , and the expression is as follows:
In formula (5), since , , which means that the sensitivity of a neuron's base is equal to the derivative of error with its all input .
2.2.2. Weight Update
Weight update process of the convolutional layer: the calculation formula of weight update of this layer is the same as the calculation formula (1) of the convolution layer. The feature graph is input into a trainable convolution kernel for convolution operation, and a bias term is added. Finally, the output feature graph can be obtained through an activation function.
represents the combination of input feature graphs. The corresponding convolution kernels of each output feature graph are different. Even though both output feature graph and output feature graph are obtained by convolution from input feature graph , their corresponding convolution kernels are still different.
If there is a downsampling layer under each convolutional layer , a pixel of the output feature map of the convolutional layer corresponds to the sensitivity D corresponding to one pixel in the downsampling layer. In order to effectively calculate the sensitivity of the convolutional layer , the sensitivity corresponding to the upsampling in downsampling will be used to upsample, so that the size of is the same as the feature map size of the convolutional layer . In addition, the sensitivity of the convolutional layer can be obtained by multiplying the sensitivity by the parameter . The calculation formula is
Here, represents the upsampling operation, and represents the multiplication of each element. If the sampling factor during downsampling is , upsampling is to replicate each pixel in horizontal and vertical directions, respectively, so as to achieve the upsampling size recovery goal. Upsampling can be realized by product:
On this basis, its sensitivity can be obtained according to a given feature graph on the convolution layer. Firstly, the gradient of base is calculated, that is, the sensitivity of all elements in the sensitivity is summed, and the formula is
According to the weight sharing feature, the gradient solution is carried out for the point through all the connections associated with the weight, and then the gradient is obtained and summed. The expression is
Here, represents the block in convolved with , namely, a unit input of convolution layer .
Formula (9) can be calculated by using the convolution function in , and the following formula can be obtained:
Here, means to rotate it. After rotation, cross-correlation calculation can be carried out, and then the input is reversed.
Weight updating process of the downsampling layer: the weight updating process of downsampling layer is the same as the calculation formula (2) of the pooling layer. If the sensitivity of the down sampling layer needs to be calculated, the updated values of parameters and can be calculated by using formula (8).
If the current downsampling layer is fully connected with the subsequent convolutional layer, the sensitivity of te downsampling layer can be calculated by the BP algorithm, and the sensitivity can be calculated by back-propagation:
Here, represents the sensitivity reversely propagated to it by the next convolution layer of current downsampling layer, and represents the rotated convolution kernel.
Then, the gradient of bias and is computed. The gradient calculation method of bias is to add all elements in sensitivity , and the calculation formula is the same as calculation formula (8) of the convolutional layer.
For the gradient calculation of bias , the downsampled in the forward propagation process should be obtained, and the expression of downsampled is
Thus, the gradient of can be calculated as
Through the above construction, the VGg model network structure of this study is obtained in Figure 3.

Through the above image processing and then combined with the two-stage pose estimation in Figure 2, the pose of human motion is obtained.
3. Human Posture Evaluation
3.1. Similarity Calculation
The above improvement is about how to estimate human posture and ensure the real-time and accuracy of estimation. After estimating a group of reliable skeletal points which can be referenced to the modified human pose model, how to identify human body posture according to human posture skeletal points has become the key of the human posture evaluation system. Considering that badminton belongs to the upper limb movement, the standard posture in various badminton sports is concentrated in the upper limb area. Therefore, on the basis of human skeletal point estimation, similarity is used to evaluate the similarity between the posture of badminton lovers and the standard badminton action library, so as to realize the objective evaluation of badminton action.
The human posture evaluation process consists of three steps: (1) convert the coordinates of the input bone points; (2) match with the standard posture library; (3) process and output the matching results.
A set of human bone point coordinates of a single frame image in the camera coordinate system is input into the human posture evaluation system, and the modified human posture model is used as the reference. In the human posture coordinate set index_T, each coordinate group has 13 points, as shown in Figure 4.

In the evaluation stage, the input coordinate system (image pixel coordinate system) should be transformed first, so as to prepare for the subsequent posture evaluation. In the coordinate conversion process, the camera's internal parameter matrix and external parameter matrix will be involved. Here, the former is fixed, and the latter depends on the location and angle of the camera lens. Therefore, in this regard, the camera's external matrix needs to be precalibrated to ensure the validity of the external matrix. Although the above process is feasible, it is difficult to operate in practice. So, a new coordinate transformation method is proposed in this paper, that is, (a) convert from an image pixel coordinate system to a rectangular coordinate system with the neck point as the origin in the human bone point; (b) transform from the rectangular coordinate system with the neck point as the origin to the polar coordinate system with the neck point as the origin and determine the angle between the other 12 points in the polar coordinate system and the positive x axis.
The coordinate transformation step (a) solves the matching problem of human posture and standard posture caused by different positions, and step (b) solves the uncertainty of human posture evaluation caused by individual body size difference.
After the coordinate transformation is completed, 12 included angle values of the positive X-axis and the vector [0, 11] are obtained, respectively, as shown in Figure 5 [14].

The calculation process of coordinate transformation is as follows:(i)Input the human bone point coordinate in the image pixel coordinate system and establish the rectangular coordinate system with as the origin(ii)The vector set is established after the coordinates of the remaining 12 human bone points subtracts (iii)Apply formula (14) to solve the included angle value between each vector in the vector set and the positive X-axis and establish the included angle set [15]:where represents the cosine angle.
3.1.1. Matching with the Standard Posture
The human posture evaluation algorithm divides the human body into four regions, as shown in Figure 6 [16].

Coordinate transformation is performed for the other points in the human posture model relative to the neck points to adjust the corresponding serial number. The right region of the upper limb is composed of the right elbow, right shoulder, and right wrist. The coordinate serial number after adjustment is 0, 1, and 2. The left region of the upper limb is composed of left elbow, left shoulder, and left wrist. The coordinate serial number after adjustment is 3, 4, and 5. The right region of lower limbs is composed of right knee, right hip, and right ankle, and the coordinate serial number after adjustment is 6, 7, and 8. The left region of the lower limbs is composed of the left knee, left hip, and left ankle. The coordinates after adjustment are 9, 10, and 11.
The posture evaluation of each small area in all the regions is to compare the posture to be evaluated with the candidate standard posture of the previous stage. And, the accumulative error is calculated. If the accumulative error does not exceed the allowable error of the stage, the standard posture is included in the candidate standard posture set.
3.2. Human Posture Assessment Process
Combined with the above analysis, the evaluation of human posture is mainly divided into the following steps:
For the right upper limb area (including the right shoulder, the right elbow, and the right wrist), the three vectors between bones and neck, as well as the angle of the positive x axis can be solved, respectively. So, the right upper limb regional similarity sets can be established. Then, the absolute values of similarity degree with corresponding standard attitude are solved, respectively. Finally, the similar standard postures are filtered with the predetermined error values.
For the right lower limb region (including right hip, right knee, and right ankle), the three vectors between the skeleton and the neck, as well as the angle of the positive X axis, are solved, respectively, to establish the regional similarity set of the right lower limb. Then, the absolute values of similarity degree with corresponding standard attitude are solved, respectively. Combining with the predetermined error value, the similar standard postures can be filtered.
For the left upper limb area (including the left shoulder, the left elbow, and the right wrist), the three vectors between bones and neck as well as the angle of the positive x axis can be solved, respectively, so as to set up the similarity sets of the left lower limb region. Then, the absolute values of the similarity degree with the corresponding standard attitude are solved, respectively. Finally, the similar standard postures are filtered with the predetermined error values.
After the above screening process is completed, the human obtained standard posture is the evaluation result. And, in this process, the cumulatively determined similarity is the evaluation value.
The above process is shown in Figure 7 [17, 18].

To determine the similarity difference in the above matching process, it can be weighted according to the influence degree of different regions on human posture. But this may lead to coupling problem, which means that two different representative values of human posture tend to be consistent after the completion of weighting. Therefore, this paper finally decided to directly output the bone point, evaluation value, and matching standard posture serial number at the end of the matching [19].
Considering that when badminton players hold the racquet with their right hand, their left hand is mainly used for coordination to maintain balance. Therefore, the algorithm in this paper cancels the matching of the left upper limb region. Meanwhile, the weight of the other three regions is optimized. Specifically, the weight of the right upper limb region is the largest, the right lower limb and the left lower limb region are second.
4. Human Posture Evaluator and Evaluation
Figure 8 shows the design idea of human posture evaluation algorithm [20].(1)Convert the human bone point coordinate system into the rectangular coordinate system. The vector of each human bone point coordinates to the origin and the angle between the included vector, and the positive x axis is solved.(2)Determine the priority of each bone point. According to the size of priority, the margin calculation is made between the selected bone points one by one and the corresponding bone point difference of all posture model in the candidate posture set. Finally, output the cumulative difference value, and after inputting all bone points, step (3) can start.(3)Solve the accumulative error. The posture model with the minimum accumulated error is found from the candidate posture set. The output results include the posture model serial number and the accumulative error.

4.1. Matching Result Processing
Through the human posture evaluation, the matched standard human posture serial number and the similarity with the matched standard human posture can be obtained.
The processing procedure of matching results is as follows:
If the output is “−1”, “0,” and “1”, it means that the left area of upper limb “fails to match,” which is the key area of badminton player's posture matching. Therefore, it can be judged that the athlete’s posture in this frame image does not conform to any posture in the standard library, which means that the athlete’s posture in this frame image is not standard.
If other information is output, the matching result is obtained, and the higher output standard human posture serial number is, the better the matching result is.
5. Method and System Verification
To verify the above method and system, this study attempts to build a badminton posture evaluation system to verify the above methods.
5.1. Method Verification
5.1.1. Data Sources and Training
To verify the above method and system, part of the video image data is selected as the basic data set for verification. Image data are obtained from badminton video, and images collected by camera and human images in COCO data set are used as training data set. The settings of training parameters are listed in Table 1.
Images in the COCO data set are equipped with human limb grayscale images and human bone point grayscale images. The image collected by the camera can become a suitable training data set only after a series of processing. The process is as follows:(1)Normalize the collected image to ensure that the pixel value of the image is in the range of [−0.5,0.5](2)Mark the pixel value of each human limbs as 0.5 and save it as the human limbs grayscale(3)Mark the pixel value of each human bone point as 0.5 and save it as the human bone point grayscale
In the first training, the model is trained with COCO data set, which ensured that the optimized model can accurately estimate the general human posture. In the subsequent training process, there is no need to use the initial weight, only need to read the weight parameters of the first training. And, the collected images are adopted to carry out training, so as to further improve the evaluation accuracy.
Only reasonably setting the basic learning rate can effectively prevent the problem of excessive learning rate. Therefore, the basic learning rate set in this paper is equal to 5e-5.
5.1.2. Loss Function Curve
After the first training based on COCO data set is completed, the collected images are used for subsequent training. The loss situation after training is shown in Figure 9.

It can be seen that, in the course of multiple training, the loss keeps decreasing trend as a whole. And, the gradient descent tends to be gentle, which finally approaches the optimal solution.
5.1.3. Accuracy and Timeliness of Skeletal Keypoint Estimation
The estimation accuracy of traditional openpose model and structure-optimized openpose model for each skeletal point is statistically analyzed, and the specific data are shown in Figure 10.

It can be seen from the figure that the estimation accuracy of optimization model is slightly lower than that of the openpose model, and the estimation accuracy of each skeletal point in the left limb is lower than that in the right limb.
5.2. Application Verification
To further verify the feasibility of the above algorithm, an experimental system is set up for verification.
5.2.1. Overall Architecture
The human posture evaluation system consists of camera acquisition module, human posture evaluation module, and prediction model module. The output result of the system is the matching result and matching loss of human posture and standard posture library in the current frame image. The matching result refers to the highest standard posture with the human posture matching degree in the current frame image, and the matching loss indicates the similarity between the human posture and the standard posture. The overall framework of the human posture evaluation system is shown in Figure 11 [21].

5.2.2. System Operation Process
The operation mechanism of human posture evaluation system is shown in Figure 12 [22].

5.2.3. Camera Acquisition Module
Camera acquisition module includes two parts, namely, hardware parameter and software interface. Among them, the key of hardware parameters is to correctly set the placement angle of the camera and reasonably determine the camera parameters. Combined with the above analysis results, the camera should be placed on the left side of the badminton net and on the right side of the badminton player. In addition, the best height is 1.2 m. The relevant parameters of the camera are listed in Table 2.
The key of the software interface is to make use of the camera interface layer to make the driver compatible, as shown in Figure 13 [23, 24].

ICmera, the base class of this module, stores one worker function and four detection functions.
Above all, the number and ID of cameras used in the human posture assessment system are determined, and the initial deployment is completed. Then, according to the site environment and the requirements for the evaluation, the camera resolution, frame rate, and other parameters are debugged. Therefore, the module sets up two function interfaces, namely, showParam and setParam. Finally, the function work is used to eliminate invalid information in the image information collected, such as resolution, width, height, and so on. The collected image is converted into a unified cv: Mat format.
The base class ICmera is used for compatibility of driver modules of other cameras, and the subsequent evaluation process adopts the form of ICmera. It can be seen that the human posture evaluation system is not sensitive to the camera model. The camera parameters must meet the setting requirements so that the driver can be set by inheriting the base class. If the evaluation is not effective, the function of ICmera can be called to debug the current camera parameters.
5.2.4. Effect Display
Bone point hot spot map: the bone point hot spot map output is shown in Figure 14.

The evaluation effect achieved by the human posture evaluation system in the test stage is shown in Figure 15.

Effect display: the human posture evaluation algorithm proposed in this paper is used to match successive single frame images. The frames 138 to 139 are successfully matched to the standard posture. The evaluation effect of these 8 frames is shown in Figure 16.

The analysis of Figure 15 shows that first, “Frame i: matching failure,” which means that the image in frame I failed to match the standard posture. Second, “Frame i: ending stage A, matching standard posture serial number B, matching loss X,” which indicates that the serial number of bone point at the exit of frame i matching is A. It successfully matched with standard posture serial number B, and the loss value of the two is X.
5.2.5. Detection Rate
The human posture evaluation system is used to evaluate the posture of 6 videos. The number of the frames to be tested and the measured frames in each video are shown in Table 3.
6. Conclusion
To sum up, through the above design, the application of the openpose neural network in the actual sports is realized, so as to provide a new reference method for the accurate training of sports. The innovation of this paper is the accuracy improvement of attitude estimation. At the same time, through the collection of badminton movements, the real-time estimation of badminton posture movements is realized, which provides a reference way for the application of this method.
Data Availability
The experimental data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest regarding this work.