Abstract

The traditional data mining method of students’ physical health has some problems, such as low recall rate of data mining, long mining time, and poor mining accuracy. Therefore, this paper proposes a data mining method of college students’ physical health for physical education reform. Using association rules to construct the correspondence between the fitness test data, the fitness test data can be classified and the data training model can be built. The decision tree of data attribute was built, and the physical health data was segmented by the segmentation technology. The information entropy of health data was calculated by the decision tree, and the information gain of health data sample set was obtained. The C4.5 algorithm was used to improve the ID3 algorithm. The improved decision tree was used to obtain the physique data splitting attribute, and the information gain rate was obtained by the ID3 algorithm correction. The -means algorithm is used to divide the data into clusters, according to which the physical health data mining of college students is realized. Experimental results show that the recall rate of the physical health data mining method proposed in this paper is as high as 96%, the data mining time is only 3 s, and the accuracy of data mining is as high as 98%, indicating that the method proposed in this paper can improve the physical health data mining effect.

1. Introduction

With the advent of the era of big data, the massive data and powerful analysis ability of big data will provide more valuable possibilities for students’ physical health promotion in concept and how to reveal hidden, previously unknown ,and potentially valuable information from the massive fuzzy and noisy historical data in the practical application database [15]. In order to comprehensively analyze students’ physical health data by using advanced technology, explore the characteristics of students’ physical quality projects with different levels of physical health and provide more accurate scientific data support for students’ physical health promotion needs by using the relevance, depth, and forward-looking thinking mode of big data, it provides a reference basis for the placement and implementation of college students’ physical exercise service policy [6, 7].

Clarifying the concept of physical quality and physique and the relationship between them is the premise and basis for understanding students’ physical condition and formulating personalized physical health promotion plan. Physique refers to the quality of human body [8]. It is a relatively stable comprehensive feature of human body in terms of physical quality, morphological structure, sports ability, physiological function, and psychological development based on congenital heredity and acquired acquired. Its scope covers the development level of physique and physical fitness, physiological function, adaptability, and mental state [912].

Physical quality refers to the comprehensive ability displayed by various organ systems in sports, labor, and daily life under the regulation of the central nervous system. It is mainly reflected in the body abilities such as endurance, strength, speed, sensitivity, and flexibility. The ability depends on the physiological and anatomical characteristics of human muscles, the energy supply during muscle work, and the regulation of internal organs and nervous system [13, 14]. It can be seen that physique includes physical quality. However, the physique promotion mentioned in this study actually refers to the promotion of physical quality, especially for the three basic qualities of human endurance, strength, and flexibility [15]. In other words, the term “physique” in this study refers to the combination of endurance, strength, and flexibility. It should be pointed out that both physique and physical quality are not only related to heredity but also closely related to the acquired environment, nutrition, physical exercise, and sanitary conditions. In particular, scientific and appropriate physical exercise can improve physical quality in all aspects and promote physical development [16].

At present, some progress has been made in the research on the physical health of college students at home and abroad, but there are also some difficulties. For example, in China, the promotion of students’ physical health has always been an aspect of the school’s close attention and development. In recent years, all schools have strictly implemented the students’ physical health test in accordance with the National Students’ Physical Health Standard. However, the massive data of physical health tests in most colleges and universities are only “stored” in the form of database, and few “stored” databases are mined and fully utilized [17]. According to the survey, most colleges and universities conduct simple statistical analysis and report when processing students’ physical health data, resulting in the “non benign cycle” of insufficient data mining, limited depth of data analysis, and relevant research unable to be used for health promotion [1821]. For example, Japan has the most complete research data on the physique of college students. Its understanding is roughly the same as that of China, including morphological structure, psychological factors, physical quality, and sports ability, but it is only different in form and formulation. The Japanese are a combination of physical factors, physical factors, and mental factors. Physical factors mainly refer to the body’s physique, body shape, physical ability, and the ability to respond to and adapt to external environmental stimuli, while mental speed refers to some psychological factors, such as will, temperament, intelligence, and judgment. The United States began to pay attention to the system as early as the 1980s, and the content and indicators of physical fitness measurement in the United States continue to develop and progress with the development of social productivity, people’s living standards, and quality of life. The measurement of college students’ physical health data in the United States is mainly the measurement of physical system. With the continuous development of physical research, people’s cognition of physique also changes. Therefore, the contents and indicators of college students’ physical health measurement are also changing, mainly including cardiopulmonary function, muscle strength, endurance, physical flexibility, and body composition. However, foreign research on physical health data mining is insufficient. This led to limited follow-up research.

In recent years, a considerable number of scholars have studied the evaluation system of students’ physical health test data, such as Bartel’s research and development of college students’ physical health evaluation system and exercise prescription, Wang’s thinking based on regression and correlation analysis, and the relationship between the weight and contribution rate of physical health evaluation indicators. This kind of research pays more attention to the evaluation methods and related analysis models of students’ physical health results, but there are still some limitations, such as a large amount of information contained in the database is buried, and the practical problems that can be solved are quite limited.

The current research does not conduct in-depth analysis on the relevance of students’ physical health data, which leads to problems such as low recall rate of data mining, long mining time, and poor mining accuracy. Therefore, this paper takes solving the problems existing in traditional methods as the research goal. This paper puts forward a data mining method of college students’ physical health for physical education teaching reform, which is conducive to database mining and improving data utilization, especially the research of scientific evaluation and analysis of data.

2. Construction of Association Rules for Health Data of Physical Fitness Test

When constructing the association rules of fitness test health data, take the initial fitness data collected by the hardware structure as the processing object and set as the data set of test items [22]. At this time, the fitness test items are , that is, the item set with length . Suppose there is a health data subset in the data set, at this time, the mapping of subsets on the test set can be defined as relevance, and the quantitative relationship of relevance can be expressed as

where represents the total number of transactions, and represents the number of health data [2326]. Under the above quantitative relationship, the probability of health data in physical fitness test data is constructed, and the numerical relationship can be expressed as:

where represents the event containing the data length, and indicates the number of physical test items containing health data. According to the above calculation formula, the probability value can reflect the proportion of college students’ physical fitness test health data in the total data set. When the probability data value is large, it can be used as the minimum confidence of mutual rules between health data. Take the minimum confidence as the screening criterion, repeatedly screen the data items in the data, and then test health data [27]. Continuously integrate health data into a health data set, set physical fitness test indicators for judging physical fitness, and gradually extract the rules that meet the requirements as the screening rules of health data, as shown in Figure 1.

In Figure 1, after constructing association rules, classify the association results of physical fitness test data output to form a health data test group, which is divided into the training data group and verification data group [2831]. Under the control of data proportion, build a data training model to measure the prediction data of the project, in order to control the accuracy of the classification function formed by the evaluation, the binary model is used to divide the algorithm model and the calculated data into different types of data. Under the condition of ensuring that they are all real data, the actual accuracy of the data training model can meet the requirements.

3. Data Mining of College Students’ Physical Health for Physical Education Reform

For the course of physical education, schools have always paid little attention to the “tradition”. It is not surprising that some schools and education management departments lack the necessary attention to the specific implementation of physical health test [32]. For example, some schools even lie or hide the results of physical health test, and it seriously limits the guiding role of sports health testing data in strengthening physical education teaching reform of the school itself.

3.1. Information Gain Based on Decision Tree

The decision tree algorithm can obtain a decision tree with less instability through segmentation technology. Let set species have sample data and category attributes with different values: and then assume that the number of samples in class is . For this sample, the total information entropy is

The probability that the sample belongs to is expressed as or . Using the decision tree classification, assuming that the test attribute is and contains different values , the set can be divided into subsets , all the samples with a value of belong to the subset , and is the subset of the generated new leaf nodes [33]. Assuming that the number of samples with category in subset is , the information entropy of the divided samples is

where and represent the probability of samples with category attribute in subset . Finally, the information gain () of sample set is

According to the information entropy formula (5) of sample set , the information gain increases with the information entropy . The uncertainty of classifying the set with as the test attribute will be reduced. There are different attributes in the set, corresponding to subsets in the set . The above steps are repeatedly called recursively, so as to generate other attributes as the child nodes of the node, and finally build a complete decision tree.

3.2. Based on C4.5 Calculation of Information Gain Rate of Algorithm

ID3 algorithm and C4 The main difference between 5 algorithms is that the judgment criteria selected for classification attributes are different. ID3 algorithm gains information, while C4.5 algorithm through information gain rate.

Split information for attribute :

Information gain rate of sample set is as follows:

Pass C4.5 when the algorithm constructs the decision tree, and the splitting attribute of the current node is determined by the maximum information gain rate [34]. If the calculated attribute information gain rate becomes smaller, the later attribute with larger information gain rate is regarded as the split attribute.

C4.5: the improvement of the ID3 algorithm is as follows:

Step 1. The C4.5 algorithm discretizes the continuous features, so that it has the ability to deal with the continuous and discrete attribute categories.
In the second step, the information gain variable is introduced to correct the problem that the information gain in ID3 algorithm tends to be biased to the features with more values.

The third step is to deal with the missing samples [35].

In the fourth step, the regularization coefficient is introduced to prune the decision tree.

3.3. Physical Health Data Mining Based on the -Means Algorithm

There are many methods to calculate the distance between physical health data objects. Euclidean distance is usually used to calculate the distance between each data object. For a given sample , the -means algorithm minimizes the square error of the cluster partition according to the class [36].

where is the mean vector of cluster . Equation (8) describes the compactness of the samples in the cluster around the cluster mean vector to a certain extent. The smaller the value, the higher the similarity of the samples in the cluster. The algorithm steps are as follows:

Input: Sample set ; Number of clusters ;

Process: Repeat.

Make an empty set, .

For j=1,2,…,m do

Calculate the distance between the sample and each mean vector :

The distance calculation formula is as follows:

The cluster marker of is determined according to the nearest mean vector. The nearest mean vector function is expressed as

Classify the sample into the corresponding cluster :

End for

For j =1,2,…,m do

Calculate the new mean vector ,

If , then

Update the current mean vector to ;

Else

Keep the current mean vector unchanged;

End if

End for

Until the current mean vector is not updated;

Output: Partition cluster ;

According to the above classification process, the physical health data mining analysis of college students for physical education reform is realized.

4. Experiment

4.1. Data Preprocessing

Generally speaking, the original data contains too much noise data and incomplete data, which needs to go through the processes of data cleaning, data screening, data processing, and so on. Data cleaning is a process of re examination and verification of data. The purpose is to delete duplicate information, correct existing errors, and provide data consistency. For example, if the vital capacity in a record is “9999”, which is obviously unconventional, screening conditions can be set to delete such data. Data filtering is to delete unnecessary fields according to actual needs. What this study needs is the relationship between the total score of physical health test and the impact degree of each test item. Filter the redundant fields such as year, grade number, class number, class name, ethnic code, and date of birth, leaving only useful fields. Data processing is to process data into a form that meets the requirements of the model according to the requirements of the specific data mining model.

According to the statistical results in Tables 1 and 2, Modeler 18.0 of SPSS is used, and 0 software is divided into male and female students. It analyzes the association rules of 7 items, four levels, and a total of 28 variables in the physical test results, including morphology, function, and quality. It excavates the deep relationship with more than 60% support and confidence among indicators and levels and presents them in the form of charts according to the relationship between the front and rear items.

4.2. Data Mining Results of College Students’ Physical Health
4.2.1. Accuracy of Data Mining

In order to verify the accuracy of this method for college students’ physical health data mining, reference [5] method, reference [6] method, reference [7] method, and this method are used to calculate the accuracy of physical health data mining. The results are shown in Figure 2.

Analysis of Figure 2 shows that there are significant differences in the accuracy of health data mining under different methods. When the amount of physique data mined in this paper is 20 GB, the data mining accuracy of the reference [5] method is 68%, the data mining accuracy of the reference [6] method is 63%, the data mining accuracy of the reference [7] method is 74%, and the data mining accuracy of this method is 99% When the amount of physique data mined in this paper is 60 GB, the data mining accuracy of the reference [5] method is 33%, that of the reference [6] method is 48%, that of the reference [7] method is 40%, and that of this method is 98%. With the increase of data volume, the data mining accuracy of the reference [5] method, reference [6] method, and reference [7] method has decreased significantly, while this method remains at a high level, which shows that the data mining accuracy of this method is better.

4.2.2. Time Consuming Data Mining

In order to verify the efficiency of this method on college students’ physical health data mining, reference [5] method, reference [6] method, reference [7] method, and this method are used to calculate the time-consuming of physical health data mining. The results are shown in Figure 3.

According to the analysis of Figure 3, when the amount of physical fitness data is 10 GB, the data mining time of the reference [5] method is 7 s, the data mining time of the reference [6] method is 17 s, the data mining time of the reference [7] method is 3 s, and the data mining time of this method is 0.5 s. When the amount of physical fitness data is 70 GB, the data mining time of the reference [5] method is 43 s, that of the reference [6] method is 42 s, that of the reference [7] method is 40s, and that of this method is 3 s. This method takes less time to mine students’ physical health data, which shows that the mining efficiency of this method is better than other methods.

4.2.3. Recall Rate of Data Mining

In order to verify the recall rate of this method for physical health data mining, reference [5] method, reference [6] method, reference [7] method, and this method are used to compare the recall rate of physical health data mining. The results are shown in Figure 4.

As can be seen from Figure 4, when students’ physical health data mining is carried out, the trend of mining recall rate curve generally decreases with the increase of data volume. When the amount of data is 50 GB, the data mining recall rate of the reference [5] method is 33%, that of the reference [6] method is 35%, that of the reference [7] method is 50%, and that of the method in this paper is 96%. The data mining recall rate of the method in this paper is much higher than that of traditional methods. This is because the method in this paper uses association rules to build the correspondence between the health data of the physical fitness test, classifies the association results of the physical fitness test data output, and forms the health data test group, which is divided into training data group and verification data group. Under the control of data proportion, the data training model is constructed to improve the recall rate of data mining and the effect of physical data mining.

4.2.4. Mean Square Error of Data Mining

In order to verify the accuracy of the method in this paper for physical health data mining of college students, reference [5] method and reference [6] method were selected from the above methods as the control group and methods in this paper as the experimental group. The mean square error analysis of data mining of physical health data mining under different methods was analyzed, and the results were shown in Figure 5.

Figure 5 shows that the errors of all physical health data mining methods are different under different methods. In the first experiment, the mean square error of the data mining method in reference [5] is 12, the mean square error of the data mining method in reference [6] is 8, and the mean square error of the data mining method in this paper is 1. In the tenth experiment, the mean method error of the data mining method in reference [5] is 13, the mean square error of the data mining method in reference [6] is 18, and the mean square error of the data mining method in this paper is 2. The data mining average method error of the proposed method is significantly smaller than that of the methods in reference [5] and reference [6], which indicates that the proposed method can achieve more accurate physical health data mining.

4.2.5. Data Mining Security

Based on the above experimental results, the safety of physical health data mining is verified by using the reference [5] method, reference [6] method, reference [7] method, and this paper. The results are shown in Figure 6.

As shown in Figure 6, the security of physical health data mining decreases with the increase of network attack intensity. When the network attack intensity is 20 dps, the health data mining security of the reference [5] method is 86%, that of the reference [6] method is 63%, that of the reference [7] method is 55%, and that of this method is 98%. When the network attack intensity is 80 dps, the health data mining security of the reference [5] method is 42%, the health data mining security of the reference [6] method is 38%, the health data mining security of the reference [7] method is 60%, and the health data mining security of the method in this paper is 96%. This shows that the method in this paper can obtain high mining effect under high network attack and data mining security is high.

5. Conclusion

This paper proposes a data mining method of college students’ physical health for physical education reform. Association rules were used to construct the correspondence between health data of physical fitness test, and the data of physical fitness test were classified. The attribute decision tree of physical fitness data was constructed, and the information gain was obtained by calculating the information entropy value of health data. The -means algorithm is used to divide the data into clusters, according to which the physical health data mining of college students is realized. The following conclusions are drawn through the experiment: (1)When the amount of data mined in this paper is 60 GB, the data mining accuracy of the method in this paper is 98%, which always maintains a high level, indicating that the data mining accuracy of the method in this paper is high(2)When the data volume is 70 GB, the data mining time of the method in this paper is only 3 s, indicating that the mining time of students’ physical health data is low and the mining efficiency is high(3)When the amount of data is 50 GB, the data mining recall rate of this method is 96%, which is much higher than that of traditional methods, indicating that this method can improve the effect of physical data mining(4)The error of all data mining methods in this paper is less than 2, which indicates that the method in this paper can achieve more accurate physical health data mining(5)When the intensity of network attack is 80 dps, the security of health data mining method in this paper is 96%. The method in this paper can obtain better mining effect under high network attack, and the data mining security is high

Data Availability

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Conflicts of Interest

The authors declared that they have no conflicts of interest regarding this work.”

Acknowledgments

This work supported by Project type: Provincial Massive Open Online Course (MOOC) Demonstration Project, Project Name: Volleyball, Project Number: 2019MOOC402.