Abstract
China’s cultural performance career has been developed in recent times, resulting in a large number of students embarking on the professional path of cultural performance. To reduce the teaching pressure on school teachers, dance training gradually introduces the idea of “digital dance”, using sensors to monitor posture. The nonwearable monitoring method does not require contact with the subject and typically relies on images or signal waves for positioning, but it has stringent requirements for the measurement scene. On the other hand, the wearable monitoring method requires the sensor to be worn on the subject’s body all the time, so it needs to be light and energy-efficient. Therefore, this paper designs and implements a set of application platforms, which parametrically analyze the dance movements, consume low energy, are lightweight, and are simple to use. For this purpose, a wearable digital dance training system based on microelectro-mechanical system (MEMS) sensors has been developed, and a solution to the challenges of teaching movement precision, scientific teaching methodologies, and early warning analysis of joint is also proposed. Firstly, the basic knowledge of human posture measurement and human model is merged to build a new modifiable human model. Secondly, the overall design of the posture data acquisition device is presented, which includes a base station and several nodes. Finally, the overall framework of the software platform is designed, which consists of client and server. The experimental results show that the system can achieve the purpose of providing teaching programs for dance students.
1. Introduction
The human kinematic model is the basis for capturing, reconstructing, and analyzing human movements. The human body’s motion is a complicated process; however, by evaluating the structure and motion characteristics of the human body, this study reduces the human body into a hierarchical joint chain skeleton model made up of multiple joints and bones. In the hierarchical joint chain skeletal model of the human body, the limbs are substituted by stiff bones, the tiny translational degrees of freedom of the joints are ignored, and only the rotational degrees of freedom of the joints are considered. The motion of the human body can be described by the linear and angular movements of the root skeleton and the rotational motions of each joint. To mathematically characterize the motion state of each limb, the reference coordinates, human limb coordinate system, and sensor measurement coordinate system are all constructed individually, and the rotation matrix, Euler angles, and quaternions are used to define the limb’s spatial posture. Various joints have different degrees of rotational motion due to the different human joint anatomy. The movements of the human body’s major joints are analyzed, and the rotatable freedom constraints of every joint, as well as joint rotation angle range constraints, are incorporated into the human skeleton model to create a constraint-based hierarchical joint chain skeletal model. In human motion reconstruction, it is necessary to calculate the position and posture (referred to as positional) information of each human limb in space, which belongs to the category of forward kinematics [1]. Unlike human forward kinematics of machines, the posture of each human limb and the position of the root bones can be obtained directly through sensor measurements in human motion capture and reconstruction based on wearable sensors, and the rotation angle of each joint and the position of each bone must be calculated by combining with the human skeletal model. This paper divides the human skeletal model into five motion branch chains based on the structure of the human hierarchical joint chain skeletal model and establishes the human joint rotational motion model and human skeletal posture model using robot forward kinematics and chi-square coordinate transformation.
Human pose recognition is the predominant way of aiding in the learning and understanding of human movements and behaviors. Human pose recognition can be used to achieve the analysis of human movement as well as the preservation of movement information [2]. When teaching dance movements, students or coaches can use the results of human posture recognition to standardize the movements. To address this problem, this paper proposes a “registration” system based on distributed, wearable, and portable technical equipment and means for objective collection of daily student dance teaching data, which can run on a “cloud platform” or PC (client). The application software and “cloud platform” is a built that provides professional analysis and guidance for students’ dance movements, as well as objective and fair basic data and technical means for students, based on the “cloud platform” and feature data extraction of student training analysis and evaluation system. The research work will provide feasible reference solutions for the future development of “digital dance “, which will have technical and application values in the training and learning process of students, including the following aspects [3].(1)Standardization of training A human joint model is abstracted from human kinematics to estimate the limit position, injury warning analysis, and fatigue analysis for various joints. The goal is to protect dancing talents and avoid the sorrow of quitting a dance career early due to excessive training techniques, volume, moves, and over-fatigue.(2)Scientificizing education Teaching by example is boring and cannot visibly convey complex actions like gymnastics. This research employs virtual reality technology, with students wearing inertial sensors, to create a three-dimensional human model scene, which can vividly replicate, freeze frame, slow replay action images, and evaluate and compare movement data.(3)Sports resource digitization Construct an analysis and assessment system for dance instruction based on the “public cloud platform”. Sharing sports materials, interactive learning and training, improving education and training efficiency, breaking down geographical and other obstacles, meeting the needs of individual and differentiated dance training, and laying the groundwork for the accumulation of digital dance resources.
The rest of the paper is organized as follows: Section 2 discusses several current approaches and technologies used for dance training. Section 3 is about dance movement detection. This section further talks about the introduction to pose recognition network, and pose acceptance dance movement detection. The server software design and implementation process are discussed in section 4. Section 5 is about the experiment, which further highlights the result and analysis as well. Finally, section 6 summarized the overall theme of the paper.
2. Current Approaches toward Dance Training
2.1. Mannequin Classification
Building a human body model is a vital aspect of measuring and evaluating human posture, as it integrates the dimensions and the functional characteristics measured by the human body into the human body model, allowing for further analysis of postural parameters and training [4]. However, the human body structure is very complex, and each movement involves the movement of bones, tendons, and ligaments [5], and the number of body movements and functions is extremely large. So, the structure of the human body, such as bones and tendons, should be appropriately abstracted and simplified in modeling to minimize the complexity of the human body model without affecting the functional analysis of parameters.
When constructing the human body model, the simplification process must follow the following points [6]: first, construct a scientific human body model based on the precise parameters of normal human body dimensions; second, chunk the overall body structure by major joints and consider each part of the limb as a rigid body, and consider the human body’s movement as the superposition of multiple rigid body movements; third, simplify the restrictive relationships between joints, bones, and ligaments, ignoring the specific shape of bones; fourth, the description of the human body model should have parameters and quantitative standards, and the data can be processed by computer simulation modeling technology. There are several common human body models including geometric models, human rod models, human solid models, and human kinematic and kinetic models, and when choosing the type of model, the choice should be based on the scenario of model application and the need for the degree of abstraction and simplification [7].
Geometric and bar model: geometric models are those, in which the limb is replaced by a figure and the joints connecting the limbs are represented by the overlap between the geometries, as shown in Figure 1(a) [8]. Solid model of the human body: the solid model generally refers to the use of three-dimensional graphics to represent the limbs, such as a cylinder for the limbs, a sphere for the head, and a rectangular body for the torso. The rod model uses matchstick-like segments and solid dots of different sizes to represent limbs and joints, as shown in Figure 1(b). Both models are simple to construct, but the geometric model is mostly flat and has no sense of three-dimensionality, while the stick model is more abstract and can show the human movement better.

(a)

(b)
This model adds a sense of three-dimensionality but is less realistic and is used in this paper as an example for analyzing simple actions, as shown in Figure 2 [9].

Human kinematic and kinetic model: this is a human body model constructed according to the characteristics of human motion, in which the limbs of the human body are regarded as rigid bodies and the human body is regarded as a system composed of multiple rigid bodies. The model conforms to the laws of human kinematics and can be combined with computer modeling technology to analyze the parameters of the model, which is suitable for the study of sports training and coincides with the needs of “digital” sports training [10].
2.1.1. Human Body Model Construction
The movement of the human body is mainly accomplished by bones, joints, ligaments, and skeletal muscles. The skeletal muscle is the active part of the movement, while the bones and joints are the passive parts of the movement. The skeletal muscle is attached to the bones across the joints and pulls the bones around the joints through the contraction of the tendons [11]. Each joint of the human body has different restrictions on the direction and angle of motion, and the motion of the limb is quantified and analyzed by constructing a model. Several requirements are needed to be met for the construction of the human model in this topic: first, high accuracy is needed to analyze the correctness of the athlete’s limb movements; second, the human model is needed to be applied to a 3D scene, so it should be as vivid as possible, otherwise, the scene effect is not realistic enough; third, the construction of the model and tools should be as simple as possible, without the need for professional tools or professional operators to achieve it; fourth, the human body model parameters must be modifiable to ensure that a personalized human body model can be provided for each athlete or different sports. The physiological constraints of bones and tendons restrict the direction and angle of movement of each joint in the human body, and the direction, position, and physiological constraints of the limbs are measured and examined through the building of the model [12]. The spatial position relationship between the upstream and downstream limbs is calculated by combining the human body model, which is used to describe the relationship between the limbs linked by a certain joint.
2.2. Non-wearable Sports Training Data Monitoring Devices
Nonwearable postures assessment techniques refer to the equipment or systems that are stationary or have no connection with the human body. The body and the assessment device undergo relative motion during the monitoring process, and the device needs to be powered by means of an external power supply. Nonwearable monitoring technology mainly includes the following.
2.2.1. Optical Image Identification Technique
The use of optical images to detect human body posture is a highly common and mature technology. The subject’s motion posture is imaged by an optical monitoring device, and the acquired images are processed to calculate posture and motion characteristics [11]. This monitoring approach necessitates the installation of multiple cameras, such as regular lenses, infrared cameras, and depth cameras. This method uses pattern recognition to track and classify video picture sequences [12]. The measured object is unaffected by optical image identification technology, but the setup and equipment requirements are significant.
2.2.2. Electromagnetic Positioning Tracking Technique
The monitoring system comprises of a transmitter, a receiver, and a processor [13]. The individual needs to wear a receiving sensor on a region of his or her body to monitor variations in magnetic induction intensity and assess human motion posture based on the transceiver-electromagnetic induction signal coupling connection. Although the electromagnetic tracking techniques have little influence on the posture of the measured object, it is subject to electromagnetic interference from other communication devices.
2.2.3. Acoustic Positioning and Monitoring Technique
The acoustic-based positioning tracking method is similar to the electromagnetic positioning tracking method, except that acoustic waves are used for measurement instead of electromagnetic waves. The acoustic source, receiver, and processing unit make up an acoustic monitoring system. The sound source sends numerous acoustic pulses that reach the item to be measured and generates echo, the acoustic receiver collects various echo signals and positions itself to track the object to be measured. This measurement method also has no effect on the motion attitude of the object to be measured, but due to the relatively low speed of acoustic wave propagation in the air, it results in poor real-time performance. In addition to the above, there are other nonwearable posture monitoring techniques, but neither the theory nor the technology has reached the requirements for promotion [14]. Several nonwearable measurement methods have advantages and disadvantages such as most of them do not require the subject to cooperate with the measurement, do not affect action posture, can guarantee monitoring accuracy, and do not require a separate power supply; however, the range of measurement is very limited, the arrangement of scenes and monitoring devices is very tedious, and the optical tracking method is easily affected by light in the erect position.
2.3. Wearable Sports Training Data Monitoring Device
Wearable posture monitoring technology refers to determining the subject’s posture based on the sensor’s position. This monitoring system requires an autonomous power supply and wireless communication technology to avoid the sensor’s effect on normal human movement. Wearable posture monitoring methods include:
2.3.1. Mechanical Tracking Method
The mechanical tracking technique compares the observed point to the mechanical structure measuring reference. Temperature, acceleration, bending stress, pressure, and other sensor characteristics are used to measure the mechanical structure. One clear downside of this measurement method is that it only works for static measurements and cannot be utilized for dynamic measurements due to mechanical interference.
2.3.2. Sensor-Based Monitoring Method
MEMS technology is a stand-alone intelligent system device with millimeter-scale dimensions [15]. As a groundbreaking new technology, at present, MEMS is mostly used in standard and high-quality industries because of its small size and lightweight. The MEMS inertial sensors move with particular sections of the body and each sensor node measures its own motion parameters [16]. To increase the quality and validity of posture detection, a single accelerometer or a set of sensors can be used. The motion acceleration and orientation are calculated using the inertial sensor node’s acceleration, angular acceleration, and geomagnetic parameter adjustment. There are now several products on the market that use MEMS inertial sensor technology, and more people are utilizing them for fitness, indicating the technology’s maturity and application possibilities [18].
In addition, the wearable posture monitoring technology also includes biofiber monitoring technology, which can ensure the accuracy of measurement by contact measurement method and has a strong ability to resist external interference [19]. MEMS inertial guidance sensor technology, with its greater measurement accuracy [20], along with the ongoing development of the human motion model, is used to gather posture information and send it to a computer terminal for further processing through a wireless communication module. The purpose of human posture monitoring can be accomplished, and its applications are becoming more and more widespread, with representative fields such as walking or running pedometry, fall monitoring, and gesture recognition. Of course, MEMS inertial sensors also have some defects, such as wearable devices should be installed with attention to the fixed, and cannot measure the displacement of the action, which can only be calculated by the acceleration integration, which will be the urgent need to solve the problem of inertial sensors.
3. Dance Movement Detection
3.1. Introduction to Pose Recognition Network
The human pose recognition network is a recognition algorithm based on the PAFs algorithm that can accurately identify key points and movements of the human skeleton in images [5]. The main process is to perform feature extraction through the first 10 layers of the VGG19 (a convolutional neural network that is deep up to 19 layers) network and feed into the key point heat map branch and limb vector branch to achieve the recognition of the human pose. The basic structure of this network is shown in Figure 3.

Branch 1 is the key point heat map branch, which is mainly responsible for predicting the position confidence map S1, whereas, branch 2 is the limb vector branch, which is mainly responsible for predicting the part affinity field L1 [21]. Iterative prediction is performed using a pose recognition network, and its prediction is calculated as shown in equations (1) and (2) [22]:where ρ and Φ denote the convolution operations corresponding to the S and L branches, respectively. To avoid the gradient disappearance problem when this network is trained, a loss function is usually added to the computation as in equation (3) [23]:where and denote key point PAF prediction and confidence map, respectively; and are the true values of the key point confidence map and PAFs, respectively. The maximum value of the individual confidence map at the jth position of the k individual can be obtained by using the true confidence map predicted by the maximum operation, as in equation (5) [24]:
3.2. Pose Acceptance Dance Movement Detection
Pose identification, key point feature processing, and motion classification make up the bulk of the dance motion detection. To begin, the source image is resized to a 368 × 368 pixel value and fed into a pose recognition network, which identifies the body’s key points. Then, the contour values of human key points are used to detect the human body regions using the residual network. As shown in Figure 4, the process of pose recognition to detect dance movements consists of key point extraction, human model reconstruction, image classification, and three components.

For the key point feature classification branch, the fully connected layers are set to 6 layers by analyzing the dance movement features. The number of neurons in the first layer is the same as the number of key point features outputted from pose recognition.
The residual units are stacked to form the residual network structure, which is then implemented in the residual block image classification branch. The neural network can learn all functions in this network structure, because it has a large number of connections. It has been demonstrated through experiments that the direct learning of residuals can reduce the difficulty of the model. A dropout layer is added between the third and fourth layers, which is fed into the neural network through the residual units provided by Resnet during its training phase. First, the input image is cropped to 512512. In the following step, the human contour frame is cropped in accordance with the key point contour value for the Resnet50 training algorithm.
Resnet50 has 4 residual blocks and is 224224 in size. The specific training process is as follows: Step 1: A 7764 convolutional layer with a step size of 2 produces a 112112 feature map. Step 2: A 33 pooling window with 2 and 3 block steps is applied to the feature map. Each block has three layers, the first with a 1164 convolution kernel, the second with a 3364 convolution kernel, and the third with a 11256 convolution kernel. Step 3: The residual unit then passes through an average pooling layer. Step 4: Two 2048 and 512 fully connected layers are connected. The network has a 6-layer fully connected structure for fusion action classification.
As shown in Figure 5, the network is a pose recognition network for dance movement detection [25].

4. Server Software Design and Implementation
This sports training software platform uses a cloud server and Netty framework (a development framework produced by JBoss) in order to realize functions such as backend server, multiple concurrent access, and data sending and processing [26]. These cloud service solutions are similar to providing virtual machines for enterprises or individual users, who can customize their choice of platform area and virtual machine performance and pay the corresponding fees. These services not only reserve the cloud communication interface but also provide APP solutions and network security services, ensuring the enterprise’s autonomous research and development. Users can ensure transparency while using it, facilitate coaches to guide athletes’ training remotely, and realize multiuser data statistics and analysis.
4.1. Netty Framework
Netty is an event-driven, asynchronous network application development framework and tool. With the fast development and widespread use of network technology in recent years, the use of Java language and Netty framework may easily construct maintainable and high-performance network server and client programs interface. The core structure of the Netty framework used in conjunction with the functionality of this software platform is shown in Figure 6.(1)Event model-driven: it means that all incoming messages are treated as events (MessageEvent), which carry some information, get the information through getMessage(), and do the analysis. According to the event processing logic written, the Channel-Pipeline manages the event handler (channel-handler) to complete the requests sent by the client and adds the handler to process data, store data, or return results. This event-driven design idea can abstract all kinds of network requests into events, whether it is the network communication connection process, encoding/decoding, or tedious business logic that can be completed by similar processing mechanisms.(2)Unified asynchronous I/O API : Netty framework solves the traditional Java I/O needs to modify the type and method of defects according to the transport protocol, using the interface channel, so that all I/O operations are unified, and the abstraction of all point-to-point communication operations are unified.(3)Zero-copy and rich Buffer: avoid the operation of data copy between user space and memory, so that the program can run efficiently. Moreover, it provides Buffer objects that can be highly aggregated and easy to operate.

4.2. Cloud Platform Services
This training software platform uses Alibaba cloud servers (ECS, elastic compute service), which is a multidisciplinary service that brings computing and artificial intelligence (AI) technologies into every corner of life instead of being “high” technology [27]. The product is an elastic and scalable computing platform that assists developers in lowering costs, increasing operational and maintenance performance, and reducing the difficult work of building and maintaining servers, so that businesses or individual users can concentrate on business innovation and development. Cloud server builds description: first, select the cloud server’s performance based on the system requirements; this software platform uses a general-purpose cloud server, which is appropriate for most corporate situations. Second, open the server’s remote connection. When building the server, the application needs to be installed on the server by using the way of remote connection via IP address. Furthermore, the network port must be available in order for other computer running client programs to connect with the cloud server, submit requests to the server, and get the needed information. When using the client to communicate with the server data, the Redis real-time database should be started as a data cache to ensure precise and efficient data transfer. Figure 7 illustrates the process of building the cloud server.

5. Experiments
5.1. Dataset Sources and Preprocessing
The experimental dataset, which was obtained from concert and dance movies, has a total of 5000 image frames. First, we examined the single frame image and an 18-dimensional dataset of character actions from the pose recognition. The training set is made up of 4000 images from the dataset, while the test set is made up of 1000 images from the remaining dataset.
5.2. Parameter
The system for detecting and recognizing dancing motions is quite accurate. To further improve the algorithm’s performance, several experiments are needed to define the optimal network parameters, including the number of neurons and batch size.
It is significant to mention that the batch size setting affects the algorithm’s rate of convergence. The batch size in this experiment is 64 attributable. In this experiment, the fully connected layer is maximized using the stochastic gradient descent (SGD) optimizer, with an epoch set at 100, and the learning rate set at 0.001.
5.3. Feature Processing
5.3.1. Feature Selection
The 18 key points of the human skeleton can be obtained through pose recognition. For the purpose of illustration, the key points are numbered as shown in Figure 8. The human joint angle, the relative position of joint points, and the joint length ratio of joint points are selected as the human movements for this experiment.

5.4. Results and Analysis
5.4.1. Algorithm Validation
The left figure shows the input image, the middle figure shows the singer’s skeletal key point heat map, and the right figure shows the maximum probability key point and key point limb region, obtained by calculating the singer’s heat map. The recognition accuracy of the test dataset is shown in Table 1 [28–29].
The overall recognition accuracy is good, as shown in Figure 9, however, the recognition accuracy of raised arm and one hand waving is poor, in comparison to others. It is because the arm lift and one-handed wave have a lower amplitude than the other four actions, and the other hand’s state is unknown, lowering the algorithm’s accuracy. The distance between the human body and the camera, as well as the filming angles, might cause fluctuations in the arm lift and one hand wave motions, causing recognition mistakes. So, these two acts have poor recognition accuracy.

5.4.2. Algorithm Comparison
To demonstrate the superiority of this algorithm, it is tested on the test set, alongside the traditional recognition algorithm residual network four-channel method and the computational Hu moment algorithm [10], with the results presented in Table 2. From Figure 10, it can be seen that the accuracy of this algorithm is higher compared to the other algorithms, this result, which achieved an accuracy rate of more than 92%, proves that our algorithm is capable of effectively detecting the human position as well as movement in dance training sessions. In addition, the time that the algorithm runs at 0.75 frames/s on a TeslaP 4 graphics card and can recognize multiple people’s movements in a single picture.

6. Conclusion
China’s cultural performance career has been explored by different researchers and academicians in recent years. With the passage of time, more and more students are pursuing careers in cultural performance, but there is a dearth of quality dancing teachers, which indeed is a serious issue. To reduce the load on school teachers, dance training centers are gradually introducing the concept of “digital dance”, by combining advanced information and network technology. Digital dance employs sensors to monitor postures, resulting in scientific and digital dance training parameters, which are then used for the detection of different movements of human body. The nonwearable monitoring approach does not require a physical connection with the object and uses pictures or signal waves to position, it has high criteria for the measuring scene. To evaluate dance movements parametrically, this study proposes and executes a set of low-energy and user-friendly application platforms. In this study, we presented a solution to the challenges of teaching movement correctness, scientific teaching techniques, and early warning analysis of joint ailments. Firstly, a parameterized and customized human model is built using basic human posture measuring knowledge. Secondly, the posture data collection device contains a base station and several nodes that collect data, and transfer it to the base station for filtering and other preprocessing before delivering it wirelessly to the software platform. Finally, the software platform’s client and server components are designed. The client’s tasks include user management, human body model construction in 3D scenes, database management, and data analysis. The cloud server facilitates multithreaded client communication and data access, and the digitalization of dancing resources. The experimental results reveal that the proposed system provides an efficient teaching program for dance students. Further, from the experimental results, it can be observed that the performance of the proposed system is better than the earlier models. The proposed system achieved an accuracy of 92.5%, which is better than the other models that attained an accuracy of 78.9% and 89.2% respectively. [17].
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
The research in this paper was supported by Guangdong Province Educational Science “13th Five-Year” 2020 research project, “Xi Jinping’s education informatization concept under the guidance of college physical education mixed teaching mode construction research” (2020GXJK006).