Abstract

How to effectively identify errors in athletes’ movements, and then achieve the goal of correcting or eliminating them, is a problem that the sports world is currently trying to solve. Sports field is a field where sensor-based human movement recognition can play a role. Obtaining human body data through sensors can provide data basis for research on the characteristics of human movement, improvement of sports level, and technical movement analysis of sports. The main research direction of this paper is the AI+IoT joint recognition of human movements. The target human movements involved include standing, walking, running, shooting, jump shot, jumping, dribbling, walking dribbling, and running dribbling. Computer vision, as a branch of the artificial intelligence research field, uses computers to simulate human visual cognitive ability, excavate useful information from digitized images or videos without manual intervention, and strive to achieve human understanding of visual signals, and realize the ability to understand visual signals from a low level. Conversion of data input to high-level knowledge output. The system scoring module is the core functional module of the system. It extracts the angle feature of the trainer’s skeleton data, finds the corresponding frame to correspond, and prompts the trainer’s current standard degree of movement in the form of score. In terms of judgment accuracy, the accuracy of all subjects after watching the complete video is higher than the accuracy of the time blocking point, with extremely significant differences,  = -10.80, df =52, . This research improves the universality of target detection and extraction in the real natural environment of the machine vision system and improves the complexity and robustness of human action recognition.

1. Introduction

Human-computer interaction is gradually developing from the early manual operation stage to the graphical user interface stage and then to today’s multi-channel, multimedia intelligent human-computer interaction stage. Nowadays, the widely used human-computer interaction technology gradually no longer requires complex forms of interactive interfaces. With the integration of IoT and artificial intelligence, the use of increasingly simple and arbitrary ways of inputting information intuitively and comprehensively captures human needs and at the same time assists use proceed accordingly.

The conversion of attribute assignment errors that are difficult to recognize by the naked eye into a form that can be easily recognized by the naked eye gives this article a great inspiration and reference. It overcomes the blind area of manual recognition, improves the method of manual recognition, and improves the accuracy of recognition will also be pattern recognition and focus of various fields of artificial intelligence is also the goal that this article hopes to achieve in the research of motion recognition system. Therefore, it is of great research significance to conduct in-depth research on action recognition and propose more effective algorithms.

In recent years, the recognition of sports has become a concern. Donald suggested that specific guidelines for surgical and nonsurgical management of ACL injuries do not yet exist. He used measurements of preinjury motor participation and knee relaxation to classify patients as high, medium, or low risk. Early reconstruction or conservative treatment is recommended for patients at moderate risk, depending on the date of visit. He evaluates subjective outcomes and activities. Tegner scores increased in patients with early and late reconstruction [1]. Rey-Lopez JP assessed the male mortality was tracked up to December 2008 [2]. Dandanell, S assessed fat oxidation in 16 individuals at a range of exercise intensities on a bicycle dynamometer. Graded exercise regimens were validated against short-term sustained exercise (SCE) regimens, where FatMax was determined based on resting fat oxidation and 10 minutes of continuous exercise at 35%, 50%, and 65% VO2max [3]. Kirchhoff said due to osteopenia or osteoporotic bone changes, the suture anchor may pull out in elderly patients [4]. Maletis believes there is debate about the best graft for ACL reconstruction. The Tegner score before injury was slightly lower in the bone-patellar tendine-bone group [5]. Mccarthy P J argues that, historically, the adolescent sports emotional response literature has focused on stress and enjoyment. Although the study of these emotional responses is important, they have not been systematically examined from a developmental perspective, so the developmental implications and implications for youth competitive sports are largely unknown. To begin to address this issue, he investigated the development of sources of enjoyment for youth sports participants [6]. In the implementation of injury intervention training, there are always one or two professionals to provide real-time feedback on the athletes’ movement techniques. From beginning to end, the correct movement pattern should be strengthened on the consciousness of the athlete. If any wrong technique is found, it should be reminded and corrected in time so that the athlete can master the correct landing technique as soon as possible.

Human action recognition based on image processing mainly relies on various high frame rate video capture devices to monitor the video image of a certain area within a certain viewing angle range, and uses graphics processing algorithms to determine whether there are people in the current area, the number of people in the current area, and the current behavior of people in the area, including people’s movements, human expressions, and human gestures. Compared with image-based human motion recognition, sensor-based human motion recognition has the biggest advantage of using various sensors such as acceleration sensors, gyroscopes, and pressure gauge, and only collects some necessary things such as speed, acceleration, and pressure that are related to the state of motion.

2. Research Plan for Intelligent Recognition of Wrong Actions

2.1. Kinect Core Module

The architecture of Kinect consists of data acquisition, data processing and feature extraction, and human gesture recognition. In the data acquisition module, Kinect can collect color data, depth data, and bone information. The purpose of the data processing and feature extraction module is to perform data normalization and other processing as needed and to calculate the relevant features of the gesture representation [7].

2.2. Image Sensor Data Collection

The traditional bone extraction scheme is affected by the insufficient amount of 2D image information and environmental factors, and its limitations are manifested when the background is slightly complicated. The human body frame is difficult to extract and even leads to the distortion of the skeleton diagram after subsequent refinement. Compared with the two-dimensional model, the three-dimensional human body model has more depth data dimensions. The color image and depth image data stream formats are shown in Table 1.

Wireless channels are usually modeled by Channel Impulse Response (CIR). CIR can be expressed as [8]: where are the amplitude attenuation, phase shift, and time delay of the -th path, respectively. The wireless network card can collect CFR samples on thirty OFDM subcarriers within the bandwidth, and the CSI corresponding to each subcarrier is [9]:

Among them, is the center frequency, which is the -th subcarrier frequency. The CSI formed by all data streams can be expressed as [10]:

The characteristic of Butterworth filter can be reflected by its gain [11]:

Among them, is the filter gain. Compared with traditional motion recognition using contact sensors (such as three-dimensional acceleration sensors, electromyographic signal sensors, etc.), human motion recognition based on computer vision does not require the analysis of the object to cooperate under laboratory conditions, only non-contact images acquisition device can complete the recognition task, which greatly improves the versatility and practicability of action recognition.

In the process of establishing the training set, the probability that each sample will not be collected each time is [12]:

2.3. Feature Extraction Module

In general, the feature extraction process is a process in which more original features are mapped to obtain fewer new features to describe the sample. For the upper limb motion signals of the same category, for example, the upper limb motion signal when people make a forehand stroke in table tennis. If you look at the physical meaning of the signal from the time domain and frequency domain, such as the extreme value of the signal in the time domain and the energy value of the signal in the frequency domain, the results are almost two types of actions. Therefore, the original data cannot be extracted as features and used to train and recognize the classifier. It is necessary to combine the features of the time domain and the frequency domain when extracting signal features. Here, the wavelet analysis method is one of the good tools. It combines the information in the time domain and the frequency domain and is a method recognized by the majority of researchers.

2.4. Action Recognition Module

Here, we must first build a file reading module; the purpose is to facilitate the call of each feature extraction method and finally generate a table file for the training of the integrated learner. The specific operation is to do a two-layer file reading loop for a video data folder, traverse each action video in the data set, and call various feature extraction methods on it in turn. For each sample, the various features extracted are merged into a one-dimensional array, and the corresponding category label (folder name) is added to this one-dimensional array, thus completing the information integration of a sample. Add the (fusion features + label information) of all samples to an empty array in turn. After the information of all samples is integrated, print the array into a table file containing all the sample fusion features and label information.

Action recognition is in the BP neural network training module. Since the weights and thresholds are random each time the network is initialized, and the errors at the end of training are not exactly the same, the weights and thresholds obtained by training are not exactly the same. That is, the results of the BP neural network after each training are slightly different. After finding the ideal results, save the network. Then, use the saved network results of VS to call through the environment configuration and finally realize the recognition of athletes’ wrong actions in the VS2010 programming environment. Action picture recognition is shown in Figure 1.

2.5. Action Analysis and Feedback Module

The movement analysis and feedback module mainly compare the trainer’s movement data with standard sports movement data, including the angle and orientation of the body parts. The threshold angle of the pre-set gymnastic movement data is used as the standard, and the movement beyond the threshold range parts was displayed on the screen to achieve the purpose of correcting the question type trainer. The main work and processing steps of this module are as follows [13, 14]: (1)Use the expression form of the human body’s skeletal angle and its vector product to represent the trainer’s movement data and sports movement standard data after the previous module’s motion capture(2)The exercise data of the trainer is compared with the standard sports action data. If the exercise data of the trainer does not meet the threshold range (the speed at which 60% to 80% of the maximum oxygen uptake intensity is reached), the wrong action will be displayed on the screen in the form of an image to remind the trainer to correct the error [15, 16].

2.6. System Scoring Module

The system scoring module is the core functional module of the system. It extracts the angle feature of the trainer’s skeleton data, finds the corresponding frame to correspond, and prompts the trainer’s current standard degree of movement in the form of score. The module process is as follows: (1)The scoring module retrieves the difference between the two corresponding bone threshold angles by comparing the exercise data of the trainer and the standard gymnastics event(2)Pre-set the action and difficulty threshold angles of different gymnastics items in the system and adopt different threshold angle calculation methods according to the selected comparison standards(3)The exercise data of the trainer and the exercise data of the standard gymnastics are weighed on the basis of the threshold angle, and the calculation results are displayed on the screen in the form of scores. The scores represent the degree of match between the trainer’s movements and the standard exercises

2.7. Kinect Data Fusion Compensation

In view of the defects in the image resolution of the Kinect sensor during the test, the image information is compensated by the sensor acquisition terminal. Use the embedded-motion-driver-5.1.3 provided by InvenSense to obtain quaternion data and use MPU6050 chip hardware to solve it, which greatly reduces the computational pressure of the microprocessor. It can be used for signal processing, rendering images in real-time, and can also be used in other areas for video processing and machine vision.

Suppose the specific logic of the object in the real world is expressed as :

is called the error object matrix.

At present, the integration of artificial intelligence and the new generation of human-computer interaction technology are mainly embodied in auditory-based speech recognition technology and vision-based image recognition technology. At moment:

During training, the cross nail legs are not straight.

At moment:

During training, the cross-nailed legs are not straight. Create error function:

3. Intelligent Recognition Results of Athletes’ Wrong Actions

This paper trains on data obtained during exercise. The average accuracy of the experimental results is shown in Table 2. The accuracy of the “sitting” and “jumping” is medium.

Table 3 shows the comparison between the non-static and static -means algorithm on the test set. During the Jump action, the non-static -means measured 85, and the static -means measured 97.

Firstly, motion and stationary behaviors are distinguished from the characteristics of signals themselves. Static and motion behavior of the range of the CSI has obvious difference between the collected signals in time domain on the volatility of different levels; obviously, the movement behavior must have more obvious fluctuations and compared with the static behavior, the larger amplitude of the signal; in addition to csi0819.dat, measure the movement of the action 2 and 3 of the data and compare the system experimental movement in the template. Figure 2 shows the amplitude of CSI signal of the first subcarrier in action 1, action 2, and action 3 in an experimental environment without interference.

CSI is the channel attribute of the communication link. It describes the attenuation factor of the signal on each transmission path, that is, the value of each element in the channel gain matrix. Compared with action 3, action 1 and action 2 have larger channel state information (CSI) amplitude values and have more obvious signal fluctuations. The same conclusion can be obtained by processing other data files multiple times. Therefore, it is easy to think of using the signal fluctuation variance to measure the stability of the CSI amplitude distribution for the distinction between these two different motion state behaviors. However, since the start and end processes of sports behaviors tend to be static, if only the variance of the CSI amplitude in the time domain is used to distinguish between the two behaviors, it may be too simple and wrong judgments may occur. Therefore, consider introducing the CSI amplitude variation. The signal distribution variance of the subcarrier changes further distinguishes motion and stationary behavior.

During sports, there will be signal blocking. And get the coordinates of the right knee and right ankle as shown in Figure 3.

Table 4 shows the results of the reaction time, accuracy, score difference, and self-confidence of gymnasts of different levels at the time blocking point and watching the complete video at the two detection points. It can be seen from Table 4 that in terms of reaction time indicators, all the subjects’ reaction time at the time blocking point is longer than the reaction time after watching the complete video. There is an extremely significant difference,  =6.35, df =52, . In terms of judgment accuracy, the accuracy of all subjects after watching the complete video is higher than the accuracy of the time blocking point, with extremely significant differences,  = -10.80, df =52, . In terms of the difference between the scoring of the action twice and the actual score of the action, the score difference after watching the complete video is higher than the score difference at the time blocking point, and the difference is close to significant,  = -1.88, df =52, .

In order to compare whether there are differences in the expected response of athletes of different levels when watching the vaulting movements with high and low difficulty coefficients, according to the high and low difficulty coefficients, the movements with high difficulty coefficients and low difficulty coefficients are divided into four groups of subjects. The expected response time and the response time judged by watching the complete video are compared, and the comparison result is shown in Figure 4. It has no significant difference between the expected reaction time of male gymnasts of different levels when watching the vaulting movement with high difficulty coefficient and the reaction time after the complete video. Repeated measurement analysis of variance shows that the interaction between the detection point and exercise level is not significant, (3,48) =0.79, .

According to the side decomposition diagram of walking and running, the difference between these two actions can be clearly seen. First of all, when walking, the human arm is naturally bent, the bending angle is larger, the arm is closer to a line segment, and the swing amplitude is small, while when running, the human arm bending angle is small, and the angle between the upper arm and the forearm is close to 90 degrees; the difference between the degree of bending of the legs and the degree of bending of the arms is similar to the difference between the degree of bending of the arms; at the same time, the stride frequency during running is higher, and people will have moments in the air, while walking has no such characteristics; finally, the center of gravity of the human torso moves up and down more obviously during running. The similarity between the two actions is that the center of gravity has a horizontal displacement, while the angle of the torso and the ground remains basically unchanged. The breakdown of running action is shown in Figure 5.

The accuracy rates obtained by using the three integration methods are quite high. The experiments on these three integration methods have proved that the classification model constructed by the BP integration method is weaker than the Adaboost integration method (AdaBoost: adaptive boosting, each sample in the training data is given a weight to form a weight vector D), but the BP integration method can also achieve very considerable accuracy. In terms of time complexity, whether it is for small noise data samples or large noise data samples, the BP integration method has shown a powerful model construction speed, especially on large data, its efficiency is much higher than other methods, and a good model construction speed can save a lot of time when the system is modified and debugged. It deserves to be an excellent integration method and can be used as a preferred model construction tool in the field of video recognition. The performance comparison of BP integration methods is shown in Figure 6.

After the intervention experiment ended, the subjects’ Landing Error Scoring System (LESS) scores were improved, and the biggest improvement was the “asymmetric touchdown” error. In the pre-test, 6 subjects were found to land on one foot first and then land on the other foot, or land normally with one foot and the heel of the other foot first. Such an asymmetrical landing will cause the athlete’s weight to fall on the leg that first touches the ground first, thereby increasing the heaviness and pressure of the side leg. This action mode is very similar to the mechanism of many non-contact anterior cruciate ligament (ACL) injuries. After the experimental intervention, many subjects who landed asymmetrically with their feet in the pre-test were normal in the post-test. The individual error scores are shown in Figure 7.

4. Discussion

Machine vision technology is closely related to computer technology, communication technology, artificial intelligence technology, mobile computing, and mass storage technology. Popular computing such as applicable computing, virtual reality, and new technology development has further promoted the application of machine vision technology in daily life more areas. The rule-based error-driven learning method is mainly repeated learning, referring to corpus research, applying BP after corpus recognition, and obtaining the corresponding rule set and statistical rule set again, in order to improve the recognition effect and obtain the contribution of each rule. Use this rule to divide the corpus in order to select the highest contribution rule from the perspective of rate and sorting rule’s contribution to a series of contributions. After that, repeat the study until you cannot find anything new. This method has been used in many fields such as word grammatical analysis and part-of-speech tagging, and good results have been obtained. In rule learning, the rule template set is first defined according to the searched candidate rule space, and each rule template must specify a specific function set used as a context element. At the same time, the selected function category and the number of functions can also be determined according to the existing error results.

In many high-level competitive sports, the expected ability of athletes is the key to achieving excellent results. There are two kinds of expectations in the sports situation: one is the expected follow-up action, and the other is the quality of the expected follow-up action. The expectations in this study are defined in the latter sense. In view of its huge application value, more and more researchers are devoted to the research of human motion recognition in video. However, due to the instability of the video, the occlusion of obstacles, the blurring of the video image, the existence of background noise, and the difference in human understanding of action behavior, motion recognition is greatly affected, so this research is very challenging. With the continuous development and maturity of artificial intelligence research and application, the important research branch of its human-computer interaction technology has gradually developed from the traditional contact type to the non-contact type and the interface type to the natural one.

In recent years, researchers have tried to better understand the ways in which elite athletes and athletes’ psychological factors can be distinguished. Many research results show that athletes have a wealth of practical knowledge and procedural knowledge, which enables them to retrieve important information from the environment. Make sure you can anticipate and predict what will happen. Expert athletes have more effective decision-making capabilities and have non-parallel processing capabilities to predict actions and results. Expert athletes are more effective in distributing attention and using cues. Video recognition is a research hotspot in the field of computer vision. Because video information is one of the widely existing information in the current society, video recognition has important practical application value in many aspects such as monitoring, human-computer interaction, and video data management. It is a well-established video recognition system that has always been the work of scholars in the field of computer vision. Based on the research of some video recognition methods in recent years, the paper proposes a video action recognition system based on multi-feature fusion and integrated learner. The goal is to establish a new video recognition system for detecting human behavior in videos.

If the athlete’s landing action pattern is unreasonable, it will put excessive pressure and shear on the joints of the lower limbs and may eventually lead to sports injuries. Therefore, the research on the athlete’s landing action mode will help the prevention and rehabilitation of sports injuries in this event. In recent years, more and more high-level sports teams pay more and more attention to the assessment of athletes’ sports injury risk. Athletes are the reserve force for the development of sports, but sports injuries have become one of the important factors that plague the development of athletes. Therefore, the research and analysis of athletes’ movement patterns and the development of targeted intervention training will help basketball players to prevent injuries.

5. Conclusion

With the continuous advancement of sports work, the judgment of sports actions and the correctness of sports training have become issues of concern to people. In many sports referees, they need to identify the wrong actions of the athletes and make judgments. For example, the recognition and punishment of foul actions in football, basketball, and volleyball, as well as the deduction of points for wrong actions in gymnastics, martial arts, and other events, so the referees in these events need to do a lot of misidentification work. Therefore, this article combines the development of computer vision to study the application of error recognition theory and methods in the referee’s decision support system. Machine vision technology was called computer vision technology in the early days, and it was an important research field of artificial intelligence. In the modern information age, machine vision plays an increasingly important role in information acquisition and analysis, and many of these technologies have been commercialized and practical. The action recognition system for video samples proposed in this paper has a small number of features extracted for each sample. Because of the limitation of the experimental hardware environment, the prepared data samples are less, so the final recognition effect needs to be enhanced, and the training data needs to be increased. The system capabilities are optimized in a centralized and other manner. Sensors of other materials could be developed in future work to make the sensors even more sensitive.

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

The work was sponsored by major project in Humanities and Social Sciences Category of Educational Commission of Guangdong Province of China in 2019.