[Retracted] Deep Learning-Based Posture Recognition for Motion-Assisted Evaluation

Yuan, Yi; Zheng, DongXia

doi:https://doi.org/10.1155/2022/7581079

Mobile Information Systems

On this page

Abstract Introduction Related Work Analysis Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Research Article Retraction

!

This article has been Retracted. To view the article details, please click the ‘Retraction’ tab above.

Special Issue

Online Processing and Analyzing of IoT Data Streams in Intelligent Mobile Edge Computing

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 7581079 | https://doi.org/10.1155/2022/7581079

[Retracted] Deep Learning-Based Posture Recognition for Motion-Assisted Evaluation

Yi Yuan¹and DongXia Zheng¹

Academic Editor: Le Sun

Received22 Apr 2022

Accepted03 Jun 2022

Published17 Aug 2022

Abstract

With the development of computer vision technology, human action pose recognition has gradually become a popular research direction, but there are still some problems in the application research based on pose recognition in sports action assisted evaluation. In this paper, the human motion pose recognition technology based on deep learning is introduced into this field to realize the intelligence of sports-assisted training. Firstly, we analyze the advantages and limitations of the state-of-the-art human motion pose recognition algorithms in computer vision in specific fields. On this basis, a human motion space recognition method based on periscope neural network is proposed. Firstly, the classical radar signal processing method is used to preprocess the echo signal of human spatial position and generate the frequency image in the process of human spatial position. Then, the periscope neural network (CNN) is constructed, and the time-frequency image is used as the input data of CNN to train the network parameters. Finally, the method is tested by using the open dataset in the network. The experimental results show that the designed CNN can accurately identify four different types of physical motion, and the accuracy coefficient is at least 97%.

1. Introduction

The recognition of human motion pose has become a concern and is widely used in computer vision [1]. Video-based pose recognition usually means inputting video data and extracting and analyzing video features through various image processing and recognition methods [2]. In order to achieve the purpose of human action recognition in video, it has a wide range of applications [3. The key of video-based posture state recognition is to extract appropriate video features and analyze and recognize these features reasonably and accurately [4]. Physical motion recognition technology is widely used in various fields. Applying natural person and natural person recognition technology to the purpose of sports recognition can accurately identify sports, compare them with existing sports, and identify and correct irregular sports [5].

Artificial intelligence technology began to emerge in the 1990s, and machine vision technology after 20 years of development has been widely used in video surveillance, virtual imaging, film and television production, and other industries [6]. In particular, the technology of character modeling to generate two-dimensional animation is one of the current hot spots of scientific and technological research [7, 8]. With the application of machine learning in the field of image processing gradually mature, the combination of deep learning and computer 2D animation imaging technology has become possible, [9] proposed the use of Toronto University’s general model - wireframe model modeling, modeling method is more efficient, simple image extraction, but the data noise is too large, affecting the accuracy of the action is affected by the excessive data noise [10]. Other domestic research units, still in the academic exploration and research stage, the proposed algorithm in the application of the hardware often has high requirements for computing power [11]. From the domestic and international research, the key to the recognition of character pose and 2D animation generation lies in the pose extraction of each action of the character itself, image compression and subsequent refinement of the convolutional neural networks in the processing of medical images show a strong advantage [12. In this paper, we focus on the effective combination of deep neural networks and pose recognition and propose an improved convolutional neural network architecture to achieve real-time character pose output in complex scenes of multiplayer motion.

The detection and recognition technology of human motion gesture can be applied not only in smart home but also in military field, which will greatly promote the development of intelligent weapons, so it has important application prospects. At present, the main method of subject space recognition is the recognition of visible light based on vision and microwave. Radar microwave-based human motion gesture recognition is not affected by light, can protect user privacy, and can penetrate certain obstacles for recognition [13]. Therefore, radar microwave-based human motion gesture recognition technology has an irreplaceable position in the fields of smart home, remote control, and intelligent weapons.

The key to radar microwave-based human motion pose recognition is to extract and identify the micro-Doppler features of the echoes. In the literature, micro-Doppler features are extracted from human action postures for recognition and classification by traditional algorithms such as support vector machine (SVM), orthogonal matching tracking (OMP), and dynamic time regularization (DTW). Although the above traditional algorithms can achieve high accuracy, they are limited to traditional supervised learning, which requires human extraction of features from micro-Doppler information, and the extracted features are difficult to migrate for application due to the limitation of the recognition object, while deep learning algorithms can overcome this limitation. In the literature [14, 15], deep learning algorithms such as CNNs and dual-stream fusion neural networks (TS-FNNs) were used to extract and recognize features from R-D (range-Doppler) maps of gestures generated by FM continuous wave radar, and the accuracy rate was significantly improved compared with traditional algorithms. This shows that the deep learning algorithm can bring a great improvement to the accuracy of radar gesture recognition. However, deep learning algorithms require a large amount of data and are prone to overfitting and error transfer for small datasets, resulting in poor recognition results [16].

This paper proposes a CNN-based microwave recognition method for human action posture. CNN can automatically extract the depth features of action echoes without human extraction, and the model has strong generalization ability [17]. Compared with the traditional BP (backpropagation) neural network, CNN uses convolutional kernels for local connectivity and weight sharing, which reduces the number of parameters and improves the learning efficiency of the network and can better solve the overfitting and error transmission problems caused by small datasets [18]. In this paper, LFMCW radar is used to acquire the human action posture echo signals, generate the time-frequency maps of human action postures, and recognize the radar echo images of four types of human action postures: walking, sitting, standing, and falling, by CNN [19, 20]. The final recognition accuracy for walking, sitting, standing, and falling movements reaches over 97%.

3. Methodology

The algorithm is based on a bottom-up human pose recognition algorithm, which is the first to identify the key points of human movement in a complex environment with multiple people and then form a skeletal map of human movement after a reasonable linkage of key points. When using convolutional neural network to process the basic image, only one convolution is needed to complete the analysis. Firstly, according to the coordinates of human joints, joint levels, and types, the feature map of human directed links is established, which facilitates the digital processing of images and then completes the convolution operation, as shown in Figure 1.

In the human feature map represented in Figure 1, the coordinates, levels, and types of key joint points are identified in the form of feature vectors, and for the feature point , the corresponding feature vector iswhere represents the probability value of the type to which the feature point and its corresponding joints belong and represents the offset value of the coordinates of the parent node of the feature point from the coordinates of the feature point itself, which is the value of the feature vector.

3.1. Acquisition of Differential Beat Signal

Figure 2 shows the time-frequency relationship between the LFMCW radar transmit signal, the echo signal, and the differential beat signal.

In Figure 2, is the starting frequency of the signal, is the maximum time delay, is the period of the signal, is the bandwidth of the signal, the effective time of the signal is , i.e., , and the effective bandwidth of the signal is usually smaller than .

Considering the multiperiod LFMCW radar echo signal, to simplify the analysis, ignoring the initial phase, the sawtooth LFMCW radar signal in the. The complex form of the emitted signal in the sweep period iswhere is the random amplitude of the transmit signal at , is the instantaneous frequency of the transmit signal at , and is the FM slope ( is the FM bandwidth and is the sweep period). At time = 0, assuming that a point target has an initial distance of with respect to the radar and approaches the radar with radial velocity (with the velocity away from the radar as positive and the velocity close to the radar as negative), the echo signal of the moving target in the effective time period of the sweep period is expressed aswhere is the attenuation constant, which reflects the influence of the environment on the electromagnetic wave and the ability of the target to scatter the electromagnetic wave; is the instantaneous delay of the target echo in the period; and , in which is the speed of light. By mixing the transmitting signal and the target echo signal in the effective band in the period, the resulting differential beat signal can be expressed as

Let ; then, substitute for , bring into (4), and ignore , and we get

Fourier transform (FT) of (5) on the interval yields the spectrum of the differential beat signal.

There is background clutter in the differential signal spectrum, which needs to be processed by MTI. The background clutter is mainly fixed target echo and slow moving clutter. In this paper, the high-pass Butterworth filter is chosen as the MTI filter to suppress the clutter.

3.2. STFT Transformation to Generate Echo Time-Frequency Map

When , equation (6), , obtains the maximum value, i.e.,

It can be seen that the frequency points corresponding to the peak of the single-period signal spectrum contain both distance and velocity information. It is necessary to perform time-frequency analysis on the spectral components of all repeated-period signals within the same frequency point by STFT, so as to obtain the Doppler shift information of the differential beat signal and convert it into two-dimensional information and then convert it into a time-frequency map [21–23].where is the spectral component of all repeated periodic signals at the same frequency point, is the Hanning window, and is the window function shift distance.

To facilitate computer processing, the signal is discretized, and the discrete form of (8) iswhere x(n) is the discrete spectral component of all repetitive periodic signals within the same frequency point, is the Hanning window, is the single move step of the window function, is the number of move steps, and is the digital frequency.

3.3. Recognition Using CNN

The time-frequency map is used as the input data and the network parameters are trained. Due to the small dataset, a CNN with fewer layers is constructed to reduce overfitting and error transmission, as shown in Figure 3 and Tables 1 and 2.

Two convolutional layers (C1, C2) with 5 × 5 convolutional kernel size and 16 and 32 convolutional kernels respectively, both in steps of 1; two pooling layers (P1, P2) with 3 × 3 and 2 × 2 pooling window matrices respectively, in steps of 3 and 2; three fully connected layers (D1, D2 and D3) with 36 992 × 64, 64 × 32 and 32 × 4 weight matrix dimensions respectively). The activation functions of D1 and D2 are Relu1, except for the activation function of the fully connected layer D3, which is softmax.

The convolutional layers (C1 and C2) use multiple convolutional kernels to extract depth features from the image. Let the original image be , the convolution kernel be , the convolution kernel dimension be , and the convolution kernel move step be . perform the convolution operation, and the output is , and then the activation function Relu returns the negative value, i.e.,

In the pooling layers (P1 and P2), the pool window matrix is used to extract the local maximum value of reservoir output, sample the matrix of each channel, output the dimension set in the pool window matrix, and move the pool window matrix. Let be the input vector with dimension 1 × 36992; be the weight matrices of fully connected layers 1, 2, and 3 with dimensions 36992 × 4, 64 × 32, and 32 × 4, respectively; be the output vectors of D1, D2, and BN (batch normalization) layers with dimensions 1 × 64, 1 × 32, and 1 × 32, respectively; and out be the predicted value of the network with dimension 1 × 4, respectively. is the bias of fully connected layers D1, D2, and D3, respectively. Let the output vector of D2 layer be ; then, the BN layer can be expressed as

Let the input vector of softmax be ; then,

The network model is

In this paper, the network parameters are updated by the gradient descent method, and the loss function cross entropy (cross entropy) is

The process of updating the parameters can be expressed as

4. Experiments and Analysis

The above method is experimentally validated using a publicly available dataset on the Web [24–27]. The dataset is obtained from the LFMCW radar, which detects four types of human gestures: walking, sitting, standing, and falling. The experiments were conducted in an indoor environment with 106 participants to obtain the motion data, and each motion was repeated 2-3 times. STFT used a Hanning window with a length of 0.2 s and an overlap time of 0.19 s [28, 29].

4.1. CNN Generalization Performance

In order to avoid the phenomenon of slow convergence due to too small learning rate and oscillation of accuracy when the parameters converge to near the optimal point due to too large learning rate, this paper adopts a segmented decay strategy of learning rate, i.e., = 5 × when iterating within 20 rounds, = 1 × from 20 to 30 rounds, = 5 × from 30 to 40 rounds, and = 1 × above 40 rounds.

The accuracy and error of the training set with the number of iteration rounds are shown in Figure 4, and the accuracy of the test set with 4 classes of image classification is shown in Table 3.

(a)

(b)

From Figure 4, the accuracy of the training set has reached more than 90% within 5 iterations, indicating that the network parameters have converged to a smaller range, but the curve oscillation amplitude is more obvious due to the large learning rate, and when the learning rate decreases after 40 iterations, the curve oscillation amplitude decreases significantly due to the reduction of the learning rate, and the accuracy reaches more than 99% after 150 iterations, and the average error is 0.0114. Due to the slight overfitting, the accuracy of the test set is always slightly smaller than that of the training set, and after 150 iterations, the accuracy is 97.208%, with an average error of 0.1106.

4.2. Effect of Network Parameters on the Recognition Effect of CNN

The accuracy and error of the training set with the number of iterations are shown in Figures 5(a)–5(c).

(a)

(b)

(c)

From Figure 5, we can get the following. ①When the activation function is changed, after 150 iterations, the accuracy of the training set is 98.6%, and the average error is 0.1668; compared with that before the parameters are changed, the oscillation amplitude of the training set is significantly larger, the overfitting is aggravated, and the generalization ability of the model is reduced. ②When the optimizer is changed, after 150 iterations, the accuracy of the training set is 99.8%, and the average error is 0.0162; compared with that before the parameters are changed, the oscillation amplitude of the training set is basically the same, the overfitting is reduced, and the model generalization ability is basically the same. When the learning rate is changed, after 150 iterations, the accuracy of the training set is 99.6%, and the average error is 0.0219. Compared with that before the parameters are changed, the oscillation amplitude of the training set is slightly increased, the overfitting is slightly reduced, and the generalization ability of the model is improved. The test results are shown in Table 4.

Therefore, when individual network parameters are changed, the generalization ability of the network model will be affected to some extent, but the accuracy of the test set always remains above 94% (see Table 4), which indicates that the network model has certain robustness and can better extract and recognize the micro-Doppler features of some simple human action postures.

5. Conclusion

A CNN-based human posture action recognition method is proposed for motion action judgment. The method obtains the time-frequency map of human action gestures by two-dimensional Fourier transform and then uses CNN to extract micro-Doppler features from the radar time-frequency map for classification. Compared with the traditional BP (backpropagation) neural network, it improves the learning efficiency of the network and better solves the problems of overfitting and mistransmission caused by small datasets. The robustness and superiority of the method are evaluated from various aspects, and the experiments are perfect and effective. Specifically, high recognition accuracy was achieved in the classification of four human action poses, namely, walking, sitting, standing, and falling, and the final recognition accuracy reached more than 97%, which achieved the expected goal.

Data Availability

The experimental data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

References

J. I. Bisson, R. Deursen, B. Hannigan et al., “Randomized controlled trial of multi‐modular motion‐assisted memory desensitization and reconsolidation (3MDR) for male military veterans with treatment‐resistant post‐traumatic stress disorder,” Acta Psychiatrica Scandinavica, vol. 142, no. 2, pp. 141–151, 2020.
View at: Publisher Site | Google Scholar
M. J. van Gelderen, M. J. Nijdam, J. F. Haagen, and E. Vermetten, “Interactive motion-assisted exposure therapy for veterans with treatment-resistant posttraumatic stress disorder: a randomized controlled trial,” Psychotherapy and Psychosomatics, vol. 89, no. 4, pp. 215–227, 2020.
View at: Publisher Site | Google Scholar
S. Zhang and V. Callaghan, “Real-time human posture recognition using an adaptive hybrid classifier,” International Journal of Machine Learning and Cybernetics, vol. 12, no. 2, pp. 489–499, 2021.
View at: Publisher Site | Google Scholar
L. Chen and S. Li, “Human motion target posture detection algorithm using semi-supervised learning in internet of things,” IEEE Access, vol. 9, Article ID 90529, 2021.
View at: Publisher Site | Google Scholar
G. Zhang and L. Zhong, “Research on volleyball action standardization based on 3D dynamic model,” Alexandria Engineering Journal, vol. 60, no. 4, pp. 4131–4138, 2021.
View at: Publisher Site | Google Scholar
A. Nadeem, A. Jalal, and K. Kim, “Automatic human posture estimation for sport activity recognition with robust body parts detection and entropy Markov model,” Multimedia Tools and Applications, vol. 80, no. 14, pp. 21465–21498, 2021.
View at: Publisher Site | Google Scholar
M. A. R. Ahad, M. Ahmed, A. Das AntarDas Antar, Y. Makihara, and Y. Yagi, “Action recognition using kinematics posture feature on 3D skeleton joint locations,” Pattern Recognition Letters, vol. 145, pp. 216–224, 2021.
View at: Publisher Site | Google Scholar
L. Zhao and W. Chen, “Detection and recognition of human body posture in motion based on sensor technology,” IEEJ Transactions on Electrical and Electronic Engineering, vol. 15, no. 5, pp. 766–770, 2020.
View at: Publisher Site | Google Scholar
A. Huang and J. Wang, “Wearable device in college track and field training application and motion image sensor recognition,” Journal of Ambient Intelligence and Humanized Computing, pp. 1–14, 2021.
View at: Publisher Site | Google Scholar
V. Igelmo, A. Syberfeldt, D. Högberg, F. Rivera, and E. Luque, “Aiding observational ergonomic evaluation methods using MOCAP systems supported by AI-based posture recognition,” in Proceedings of the 2020 6th International Digital Human Modeling Symposium, vol. 11, pp. 419–429, Skövde, Sweden, September 2020.
View at: Publisher Site | Google Scholar
E. Rocha-Ibarra, M. I. Oros-Flores, D. L. Almanza-Ojeda et al., “Kinect validation of ergonomics in human pick and place activities through lateral automatic posture detection,” IEEE Access, vol. 9, Article ID 109067, 2021.
View at: Publisher Site | Google Scholar
R. Ali, M. Afzal, M. Hussain et al., “Multimodal hybrid reasoning methodology for personalized wellbeing services,” Computers in Biology and Medicine, vol. 69, pp. 10–28, 2016.
View at: Publisher Site | Google Scholar
R. Ali, M. Afzal, M. Sadiq et al., “Knowledge-based reasoning and recommendation framework for intelligent decision making,” Expert Systems, vol. 35, no. 2, Article ID e12242, 2018.
View at: Publisher Site | Google Scholar
S. Sandhya Rani, G. Apparao Naidu, and V. Usha Shree, “Kinematic joint descriptor and depth motion descriptor with convolutional neural networks for human action recognition,” Materials Today Proceedings, vol. 37, no. 1, pp. 3164–3173, 2021.
View at: Publisher Site | Google Scholar
M. Lu, Y. Hu, and X. Lu, “Driver action recognition using deformable and dilated faster R-CNN with optimized region proposals,” Applied Intelligence, vol. 50, no. 4, pp. 1100–1111, 2020.
View at: Publisher Site | Google Scholar
C. Wang, J. Zhou, B. Xiao et al., “Uncertainty Estimation for Stereo Matching Based on Evidential Deep Learning,” Pattern Recognition, vol. 124, 2021.
View at: Publisher Site | Google Scholar
B. Li, B. Bai, and C. Han, “Upper body motion recognition based on key frame and random forest regression,” Multimedia Tools and Applications, vol. 79, no. 1, pp. 5197–5212, 2020.
View at: Publisher Site | Google Scholar
M. F. Leung and J. Wang, “Minimax and biobjective portfolio selection based on collaborative neurodynamic optimization,” IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 7, pp. 2825–2836, 2021.
View at: Publisher Site | Google Scholar
P. An, Z. Wang, and C. Zhang, “Ensemble unsupervised autoencoders and Gaussian mixture model for cyberattack detection,” Information Processing & Management, vol. 59, no. 2, Article ID 102844, 2022.
View at: Publisher Site | Google Scholar
S. Nagi Alsubari, S. Deshmukh, A. Abdullah Alqarni, N. Alsharif, O. Waselallah Alsaade, and O. I Khalaf, “Data analytics for the identification of fake reviews using supervised learning,” Computers, Materials & Continua, vol. 70, no. 2, pp. 3189–3204, 2022.
View at: Publisher Site | Google Scholar
S. Bharany, S. Sharma, S. Badotra et al., “Energy-efficient clustering scheme for flying ad-hoc networks using an optimized LEACH protocol,” Energies, vol. 14, no. 19, p. 6016, 2021.
View at: Publisher Site | Google Scholar
M. F. Bulbul and H. Ali, “Gradient local auto-correlation features for depth human action recognition,” SN Applied Sciences, vol. 3, no. 5, pp. 535–613, 2021.
View at: Publisher Site | Google Scholar
N. Nida, M. H. Yousaf, A. Irtaza, and S. A. Velastin, “Deep temporal motion descriptor (DTMD) for human action recognition,” Turkish Journal of Electrical Engineering and Computer Sciences, vol. 28, no. 3, pp. 1371–1385, 2020.
View at: Publisher Site | Google Scholar
A. Sabater, I. Alonso, L. Montesano, and A. C. Murillo, “Domain and view-point Agnostic hand action recognition,” IEEE Robotics and Automation Letters, vol. 6, no. 4, pp. 7823–7830, 2021.
View at: Publisher Site | Google Scholar
J. Li and D. Gu, “Research on basketball players’ action recognition based on interactive system and machine learning,” Journal of Intelligent and Fuzzy Systems, vol. 40, no. 2, pp. 2029–2039, 2021.
View at: Publisher Site | Google Scholar
J. Li, Z. Zhou, J. Wu et al., “Decentralized on-demand energy supply for blockchain in internet of things: a microgrids approach,” IEEE Transactions on Computational Social Systems, vol. 6, no. 6, pp. 1395–1406, 2019.
View at: Publisher Site | Google Scholar
W. Duan, J. Gu, M. Wen, G. Zhang, Y. Ji, and S. Mumtaz, “Emerging Technologies for 5G-IoV Networks: Applications, Trends and Opportunities,” IEEE Network, vol. 34, no. 5, pp. 283–289, 2020.
View at: Publisher Site | Google Scholar
K. Tang, A. Kumar, M. Nadeem, and I. Maaz, “CNN-based smart sleep posture recognition system,” IoT, vol. 2, no. 1, pp. 119–139, 2021.
View at: Publisher Site | Google Scholar
J. Y. He, X. Wu, Z. Q. Cheng, Z. Yuan, and Y. G. Jiang, “DB-LSTM: densely-connected Bi-directional LSTM for human action recognition,” Neurocomputing, vol. 444, pp. 319–331, 2021.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Yi Yuan and DongXia Zheng. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Mobile Information Systems

Online Processing and Analyzing of IoT Data Streams in Intelligent Mobile Edge Computing

[Retracted] Deep Learning-Based Posture Recognition for Motion-Assisted Evaluation

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Acquisition of Differential Beat Signal

3.2. STFT Transformation to Generate Echo Time-Frequency Map

3.3. Recognition Using CNN

4. Experiments and Analysis

4.1. CNN Generalization Performance

4.2. Effect of Network Parameters on the Recognition Effect of CNN

5. Conclusion

Data Availability

Conflicts of Interest

References

Copyright