Abstract
The information society has evolved in tandem with the continual improvement of the economy in recent years. People cannot survive without the creation of knowledge, and the creation of information in the classroom is also crucial. The goal of this research is to look into how to build a service-oriented architecture for informatizing physical education resources. This study offers a clustering analysis algorithm based on the Internet of Things and evaluates the clustering algorithm before and after improvement, based on the service-oriented architecture. The experimental results of this research suggest that information technology is still not fully evolved and mature in 2015. The information technology has not been fully developed, and the information technology is not very mature, and the growth rate is only 2.9%. However, by 2021, the construction of informatization has developed rapidly, and the growth rate has reached 16.8%. During this period, the growth rate has gradually increased every year, indicating that informatization has become more and more popular. In education, informatization construction has also begun to be gradually applied, such as smart classrooms and online learning. Traditional physical education can no longer meet the requirements of current students, so it is very necessary to carry out informatization construction of physical education resources. Modern information technology is also widely used in the field of pedagogy. In recent years, the audio-visual equipment in schools at all levels has been continuously improved and upgraded, and teachers’ awareness of integrating modern information technology into education and teaching life has been continuously enhanced and achieved results. Strengthening the construction of university sports informatization is an inevitable trend in the development of university sports, adhering to the development of information technology and information network as the basis and use modern information technology to improve teaching efficiency.
1. Introduction
In today’s society, sports are increasingly becoming an indispensable part of people’s lives. With the rapid development of the network and the characteristics of sports itself suitable for network communication, the development of network communication of sports information has also made great progress. With the advent of economic globalization and the information age, technology has penetrated into all fields of society. Based on the practice of informatization and the popularization of sports information network, it is indispensable to study the development of sports informatization in China.
Physical education is an integral part of school education, and school physical education is the basis of lifelong physical education for students. The majority of young people are physically and mentally healthy, physically strong, strong-willed, and full of vitality, which is the embodiment of a nation’s vigorous vitality. As higher education continues to overhaul, enrolments have grown and so has the difficulty of traditional education. Especially in physical education, the allocation of funds, equipment, and production areas, as well as the construction of physical education informatization, lag far behind the overall requirements of students. Under the guidance of modern science and technology, strengthening the construction of university sports informatization and giving full play to the role of informatization will help to promote long-term and stable education in universities.
The innovation of this paper is as follows: (1) This paper introduces the theoretical knowledge of service-oriented architecture and physical education resource informatization construction and uses cluster analysis algorithm to analyze how service-oriented architecture plays a role in physical education resource informatization construction. (2) This paper expounds the clustering analysis algorithm before and after improvement. Through experiments, it is found that the clustering analysis algorithm can effectively analyze the development of physical education resources informatization construction.
2. Related Work
With the advent of the information age, in the physical education, teaching resources have also begun the construction of information. Anjaria and Mishra discovered that no computing architecture can be completely hidden, so information can leak at any time. As a result, ensuring the security of information in any computing architecture becomes critical. He proposes a method for information confidentiality in Web services based on service-oriented architecture (SOA) to prevent privacy leakage. He stated unequivocally that he believes service-oriented architecture can effectively ensure information security, but he did not test the method he suggested [1]. Medical device communication protocols, according to Andersen B, are ideal for integrating point-of-care devices. The integrated operating room (OR) and other integrated clinical settings, on the other hand, require interoperability between the two domains in order to reach their full potential for improving care quality and clinical efficiency. As a result, he recommends disseminating clinical and administrative data via medical devices [2] and physiological assessments. His ideas have been proven to work with IoT medical device systems to increase healthcare workers’ efficiency. He suggested using IoT to improve care quality, but the feasibility of this approach has yet to be proven. Smartphones, smart watches, vehicle position trackers, and other gadgets with location tracking capabilities, according to Chu et al., are becoming more widely used. On the other hand, data mining and advanced analytics are rarely included with these devices, limiting their utility. He demonstrated a general-purpose programmable position monitoring platform with a cloud-based sophisticated analytics engine, among other things. It is intended for use in a variety of spatiotemporal applications and can effectively reduce consumption. He did not provide any experimental evidence to back up his claim [3]. Yang et al. discovered that employing a microservice architecture to address the issue of medical resource sharing without jeopardising patient privacy is advantageous. To do so, he recommends rewriting legacy systems into autonomous microservices that connect via unified technology, allowing users to directly handle clinical data inquiries as well as more efficiently handle internal and external requests. His solution is distinct in that it avoids the data deidentification process, which is commonly used to protect patient privacy. The benefits of this method are not explicitly stated [4]. According to Yin and Du, the microservice architecture (MSA) system has become the mainstream architecture in recent years, and elasticity is a key feature of the MSA system, which reflects the ability to deal with various system disturbances that result in service degradation. Despite the fact that much work has been done on MSA system resilience, due to a lack of consensus on the definition of resilience in the software field, developers still do not have a clear idea of how resilient an MSA system should be or what kind of resilience mechanisms it should have. He defined microservice elasticity and alluded to current systematic studies on elasticity in other scientific domains, as well as proposing a microservice elasticity measurement model for measuring service elasticity. He did not, however, summarise how flexible the MSA system should be, nor did he define it specifically [5]. The theory of digital infrastructure, service-oriented architecture, and microservices is introduced by Hustad and Olsen. They discussed the benefits and drawbacks of building a sustainable infrastructure based on service-oriented architecture, which is fascinating. Although they acknowledged that service architecture has a significant impact on the market, they did not go into detail about it [6]. After entering the “Internet +” era, Qi S discovered that colleges and universities began to prioritise informatization construction in the development process. In the new era, how to build a complete information service and management system while maximising the benefits of information service has become a critical decision for university libraries as they build the “Internet+library service” model. He spoke about the role of service and management in the informatization of university libraries, emphasising the importance of providing good support for the long-term and stable development of university libraries in the new era, as well as promoting the informatization of university libraries. However, he did not elaborate on discussion’s outcomes.
3. Cluster Analysis Method for Service-Oriented Architecture
The rise of technologies such as cloud computing [7] and the Internet of Things provides an opportunity to solve the problems faced by current sports informatization and promotes the proposal of “smart sports” in sports informatization. Cloud computing, Internet of Things, and other emerging technologies [8–10] are the technical information support of “smart sports.” The application of “smart sports” involves all aspects of sports, and its purpose is to better exert the functions and service functions of sports. Sports informatization refers to speeding up the development of sports information resources, realizing the sharing of sports information resources, and further optimizing the distribution of sports resources with the popularization of the whole social information network. Finally, the effective use of information can greatly improve the efficiency of sports resources in the whole society. This paper analyzes the development trend of informatization construction from 2015 to 2021, as shown in Figure 1.

As shown in Figure 1, modern information technology can digitally process various media information, so the integrated multimedia information is inherently diverse, comprehensive, and bidirectional. Excellent learning environments and learning tools can improve students’ cognitive methods and greatly improve their efficiency [11].
3.1. Problems Encountered in Traditional Teaching
After the 21st century, with the rapid development of science and technology, we ushered in the era of mobile Internet represented by smart phones. Mobile phones are becoming the most common smart mobile devices in people’s daily life, and people are more and more accustomed to obtaining various information through these smart mobile devices. Intelligent teaching is shown in Figure 2.

As shown in Figure 2, in the aspect of university education, most of the previous teaching material platforms were based on computers, and computers were not easy to carry, which brought inconvenience to teachers and students. At the same time, the previous teaching material platform will gradually adapt to the growing educational needs of universities. It faced the following issues:
Now, universities pay more and more attention to the educational process and pay more and more attention to the learning situation of students, but on the traditional educational auxiliary platform, these educational information cannot be recorded. At the same time, teachers cannot grasp the specific learning situation of students, and students cannot timely feedback the problems encountered in learning.
Traditional education support platforms still have hidden security risks. For example, after freshmen are enrolled, many freshmen will be troubled by various marketing services, thereby leaking personal information of students. Therefore, the traditional university education support platform must be optimized from the perspective of service security [12].
3.2. Advantages and Algorithms of Service-Oriented Architecture
Service-oriented architecture (SOA) is a design method under which business is transformed into interconnectable and reusable services. It has nothing to do with the programming language used, but is completed in combination with specific business tasks, which can quickly adapt to the changing objective conditions and needs of the enterprise and promote the adjustment of functions and business goals [13]. Service-oriented architecture (SOA) is a component model that splits different functional units of an application (called services) and links them together through well-defined interfaces and protocols. The interface is defined in a neutral way, it should be independent of the hardware platform, operating system, and programming language that implements the service. The service-oriented architecture is shown in Figure 3.

As shown in Figure 3, in this case, the total amount of information that needs to be transmitted is the same with the node with a higher aging level. The information transmission time that should be obtained should be longer, and the data transmission rate will be higher when the validity level is the same. The calculation of the application multiple calculates the ratio of the time delay of the transmission data of various nodes through the aging level of each sensor node as
Calculated in the above formula is the ratio of the time delay of type nodes to transmit data, and is the number of sensor nodes. When the amount of data collected by various nodes is the same, the node type that can send less data at the same time should be assigned a longer time slot value. Then, considering the data transmission rate of each type of node, the time ratio occupied by each type of node to transmit data can be calculated as
After calculating the ratio of each type of node to the total time through the above formula, the data transmission time occupied by each sensor node can be calculated according to the number of various types of nodes and the total time [14]. Assuming that the time of one rotation of the data is , the calculation formula is as
Taking the aging level of various nodes, the data transmission rate , and the quantity of various data as parameters, the time value that each node should be assigned when sending data is calculated. When the number of nodes in the system changes, the calculation method will also calculate according to the new situation and obtain the time allocated to the new node [15].
The advantages of service-oriented architecture are as follows: (1)Service scalability. The realization of the core functions in the initial stage of the system is the main goal. However, with the change and increase of functional requirements in the later period, if the scalability of the system is not enough, it will consume a lot of human resources and time after actual use. The service architecture considers the scalability of system services [16](2)Security of the service. University’s mobile teaching platform is a highly exclusive platform, including personal information of teachers and students. To ensure the efficiency of teaching and learning, external interference must be excluded. Some functions of the mobile client need to ensure the security of services. For example, the intelligent peer-to-peer function of the course needs to identify malicious monitoring of information and also needs to use the same mobile device to detect and determine the security situation of multiple accounts [17](3)The reliability of the service. Service reliability directly affects user resource management. Users use the services provided by the platform through smartphones. When many users use it at the same time, multiple simultaneous requests will be generated, the system server will crash, the mobile client will not respond, and user resource management may be very poor. If there are so many simultaneous on-demand system servers every day, there is a lot of pressure. The service-oriented architecture can solve such problems
3.3. Clustering Similarity Calculation Method
Cluster analysis refers to the analysis process of grouping a collection of physical or abstract objects into multiple classes composed of similar objects, and it is an important human behavior. The goal of cluster analysis is to collect data to classify on the basis of similarity. Clustering has its origins in many fields, including mathematics, computer science, statistics, biology, and economics. As a function of data mining [18], cluster analysis is used to process detected clusters and selected attributes or features.
3.3.1. Euclidean Distance
The traditional concept of distance is used, which is widely used [19]. In mathematics, the Euclidean distance or Euclidean metric is the “ordinary” (i.e., straight line) distance between two points in Euclidean space. Using this distance, the Euclidean space becomes the metric space. The associated norm is called the Euclidean norm, and is defined as
3.3.2. Minkowski Distance
It is an extension of Euclidean distance. It is the Euclidean distance transformation in digital image processing, which refers to converting the pixel value in the foreground into the distance from the point to the nearest background point for a binary image, which can be understood as the distance in N-dimensional space, defined as
3.3.3. Manhattan Distance
Manhattan distance is also known as City-block distance, another widely used distance measurement method, that is, in the fixed rectangular coordinate system of Euclidean space, the total distance of the projection of the line segment formed by the two points to the axis, defined as
The weighted Euclidean distance can be calculated as
3.3.4. Minimum (Single-Linked) Distance
Minimum (single-linked) distance is the shortest distance between all tuples in one cluster and all tuples in another cluster, which is
The maximum (full link) distance is
3.3.5. Average (Group-Averaged) Distance
Average (group-averaged) distance is the average distance between all tuples in one cluster and all tuples in another cluster, which is
where and are the number of samples of classes and , respectively. Here, represents the distance between two objects and .
3.3.6. Center Point Distance
In the research of machine vision and pattern recognition, transforming an image into a binary image is the key to more efficiently identify specific regions or objects in an image. If the center point is used to represent the cluster, the center point distance refers to the distance between the center points of the two clusters, that is,
where and are the center points of and , respectively.
3.4. Improved Real-Time Data Stream Clustering Analysis Algorithm
The results of clustering are ultimately user-oriented, and users expect the information obtained from clustering to be understandable and applicable, but sometimes it is not satisfactory in actual mining. This paper presents an improved clustering algorithm. The improved algorithm is based on the density grid clustering analysis algorithm D-Stream, uses the two-stage processing model of the CluStream algorithm, and divides the clustering process into an online processing stage and an offline processing stage [20]. Compared with the CluStream algorithm, D-Stream has more advantages in clustering quality and efficiency and has better scalability for massive high-dimensional stream data. The main idea is to divide the mesh initially and then divide the obtained mesh into submesh with small granularity. According to the grid position, it is divided into internal grids and boundary grids. On the online layer, the internal grids are microclustered with large-grained grids based on the grid density factor and dynamically adjusted to form initial grid clusters. The offline layer performs fine-grained clustering of boundary grids according to the initial grid cluster information [21]. The improved real-time data stream clustering analysis algorithm is named DSG-Stream algorithm, and its main process is shown in Figure 4.

As shown in Figure 4, the online stage processing also includes data preprocessing, mesh attenuation, processing of isolated meshes, and adjustment of boundary meshes. After the offline stage clustering is completed, the information is stored in the form of pyramid snapshots, which is convenient for users to query. Each part will be introduced in detail in the following sections. In the process of online microclustering, the time series generated by the microcluster according to the pyramid time frame are stored in the form of snapshots in time. This framework considers the storage requirements on the one hand, and the recovery of offline macroclustering in different time periods on the other hand.
The grid tuple corresponding to grid G is . Among them, represents the time of the last data arrival, Cen represents the grid centroid, Den is the grid density, and Class represents the cluster class to which it belongs. Dstatus represents whether it is an abnormal grid, Dlittle represents the set of small grid cells after secondary division, which is represented by a vector, and Bstatus represents whether the grid is a boundary grid. represents the numbered set after grid division, as shown in Figure 5.

As shown in Figure 5, the grid density is the sum of the coefficients of the data points in each grid, denoted as . According to the different density, it is divided into dense grid, sparse grid, and transition grid. The dense threshold and the sparse threshold are specified, and the average grid density is
where is the number of nonempty grids, and is the density of the th grid.
For fine-grained grids, there are many ways to divide the attribute space, the common method is to divide it into discrete intervals of equal width. The dense grid threshold is
where is the dimension of the data stream, the corresponding sparse grid threshold , and the density threshold will be dynamically adjusted in the algorithm. In order to make the data clustering reflect the impact of the latest real-time data on the clustering, the grid density will be attenuated by a coefficient at intervals of , which is called grid density attenuation. The dynamic changes of the data flow can be obtained by the technique of density attenuation, and the complex relationship between the attenuation factor, data density, and cluster structure can be found. For grid and grid , make formula (14) true if and only if exists:
If formula (3) holds, grids and are called adjacent grids to each other.
Let be the attenuation coefficient, the density of the grid at time is denoted as , and the grid density is where is the last update time, is the grid density of , and is the number of data arriving. The grid where the boundary of the cluster is located is called the boundary grid, and the grid cells are all internal grids except the boundary grid. According to the location of the grid and the density of the grid, it is judged whether it is a boundary grid. If the grid is a dense grid and the grid adjacent to the grid is a nonempty grid cell, it is an internal grid, otherwise it is a boundary grid. Since the division of the grid is easy to affect the processing of the boundary grid, the clustering accuracy is reduced, so this paper focuses on the processing and clustering of the boundary grid. The grid centroid, which is the center point of the data points in the grid, is
Among them, represents the data items arriving in the data stream, and the grid centroid is maintained by this incremental method, so that the algorithm complexity is not too high.
Calculating the influence between grids to judge whether they are the same, is different grids, and their mutual grid influence factor is
where is a constant, and and are the grid density of grid , respectively.
Some grids have fewer data points and cannot reach the specified threshold. If more such grid cells are maintained in the algorithm, the overall algorithm efficiency will be affected. This kind of grid cell should be removed according to the situation, so as to improve the efficiency of the algorithm.
For the first type of isolated grid cells, if its subgrid unit density is lower than the subgrid dense grid unit threshold, and the grid unit has no data flow arrival in the last interval time , it will be removed. That is, in addition to satisfying the above threshold, the grid cell satisfies the condition as
In the process of dynamically receiving real-time data streams, microclustering is performed on the internal grid to form the initial grid cluster as the initial cluster center.
3.5. The Value of Informatization of Physical Education Teaching Resources in Ordinary Colleges and Universities
(1)Students’ learning enthusiasm has been improved. Information technology is very attractive to college students. Therefore, the introduction of information technology will naturally promote the improvement of students’ interest in learning and promote students’ concentration, which is undoubtedly of great significance(2)In fact, at no time should teaching work be one-way, rigid, and lifeless. Especially in today’s society, students’ autonomy and individuality are getting stronger and stronger, hobbies and sports skills, fully recognize the individual differences between students and teach students in accordance with their aptitude. And the traditional teaching model does not seem to do this very well. When the physical education resources are informatized, it is possible to build a platform for the interaction between teachers and students
3.6. Problems Existing in the Process of Physical Education Teaching in Colleges and Universities
At this stage, the infrastructure construction required in the process of university sports informatization construction is obviously insufficient, the investment in education and the consideration of sports are insufficient, the physical education environment of universities is limited, and the development of sports information resources is seriously insufficient. The traditional physical education teaching is shown in Figure 6.

As shown in Figure 6, on the other hand, universities lack professional network maintenance personnel, and there are no supporting multimedia educational facilities and sports facilities. Multimedia equipment has a profound impact on the applicability and practicality of physical education.
4. Experiment and Analysis of Cluster Analysis Algorithm before and after the Improvement of Service-Oriented Architecture
With the advent of the information age, sports information resources also need to be integrated with sports information technology in order to keep up with the pace of the times. Informatization construction in physical education resources is conducive to improving the efficiency of physical education.
4.1. Construction of Hadoop Platform
Firstly, an artificial data set is used to test the clustering effect of the DSG-Stream algorithm. For the convenience of observation, the data set is processed in the form of two-dimensional data and simulated data flow. The selected data points have a certain shape effect, and the algorithm processes them after receiving the data points. Finally, the DSG-Stream algorithm is used to compare the clustering results with the final clustering shape of the D-Stream algorithm.
The experimental environment is Windows7 system, and four machines are used to build the Hadoop platform in the experiment, and its network topology is shown in Figure 7.

As shown in Figure 7, the Hadoop platform has three modes: local mode, pseudodistributed mode, and fully distributed mode. The Hadoop platform can build a large data warehouse, PB-level data storage, processing, analysis and statistics for search engines, log analysis, and data mining. In order to fully demonstrate the advantages of distributed computing, this paper adopts a fully distributed mode. There are 5 machines in the cluster, and their hardware configurations are shown in Table 1.
As shown in Table 1, a distributed computing system can have the computing power of multiple computers, which makes it have a faster processing speed than other systems and can provide higher performance than a centralized computer network cluster, installing the corresponding system for each machine, configure the relevant environment, and prepare for the construction of the platform, writing the corresponding information of the IP and hostname of the host in the LAN into this file, so that the hosts can link to each other. Therefore, it is necessary to add table information at the end of the file of all hosts in the LAN.
4.2. Comparative Experiment of K-Means Algorithm
Cluster analysis is also often used as a data set preprocessing process, which can be combined with other data mining methods to conduct a deeper research on the data that has completed cluster analysis.
The method part analyzes several disadvantages of K-means and proposes a strategy to optimize the initial value and theoretically expounds the preprocessing data set. After the second calculation, the value of the number of groups and the initial cluster center value of the K-means algorithm are obtained. This approach can improve the accuracy of clustering and reduce the number of iterations in the clustering process. The data set used for comparative experiments should not be too large, and the data distribution in the data set should be relatively dense. The comparison of the clustering accuracy of the clustering analysis algorithm before and after the improvement is shown in Figure 8.

(a) Clustering effect of the clustering analysis algorithm before improvement

(b) Clustering effect of the improved clustering analysis algorithm
As shown in Figure 8, it can be seen from the effect diagram of the experiment that the distribution of points in the data set is relatively uniform, and there are some relatively concentrated data points. The experimental results before and after the improvement of the algorithm are compared, and it is proved from the experiment that the effect of the improved algorithm is better than the original algorithm. It can be seen from the latter experimental results that the distribution of the final five groups is relatively average, showing that the distribution of the data set is closer to the real grouping.
This paper compares the execution efficiency of the K-means algorithm in the stand-alone environment with the algorithm execution efficiency in the cluster environment and draws the adjustment of the algorithm efficiency by distributed computing. According to the purpose of the experiment, it is not suitable to select the data set on UCI, because the number of objects and the dimension of the objects contained in the data set itself are difficult to determine. In order to compare the efficiency of the algorithms in the two environments more clearly, this comparative experiment uses randomly generated formatted data, and the data objects range from 5 to 20. The comparison results of the running time under the single machine and the running time on the cluster are shown in Table 2.
As shown in Table 2, the time-consuming time unit in the experiment is ms. Except for the last two data tested in the stand-alone environment, other experimental results are tested 10 times, and then the average value is taken. From the data obtained from the experiment, it can be seen that in the case of a small amount of data, the efficiency of running the algorithm on a single machine is higher than that in a cluster. Because the communication between nodes and between clusters, data scheduling, and other processes needs to consume a certain amount of resources. When the data set is small, the time consumed by the communication process is much greater than the actual processing time of the algorithm. But when the amount of data reaches 15, the data of both begin to change significantly. At this time, parallel processing has already begun to show its advantages, and with the continuous increase of the data set, the speedup obtained continues to increase.
The time and clustering accuracy of the original cluster analysis algorithm and the improved cluster analysis algorithm (DSG-Stream algorithm) is compared as shown in Figure 9.

(a) Time spent in clustering

(b) Comparison of clustering accuracy
As can be seen from Figure 9, the time spent by the original cluster analysis algorithm and the improved cluster analysis algorithm (DSG-Stream algorithm) is different. The time spent by the original cluster analysis algorithm clustering is between 12/s and 18/s, while the time spent by the DSG-Stream algorithm is only between 4.9/s and 8.2/s.
5. Conclusions
The goal of college sports informationization can be achieved by incorporating information technology into physical education resources. A detailed experimental analysis of the cluster analysis before and after the improvement is carried out in the experimental section. First, the improved clustering analysis algorithm’s clustering accuracy is compared. Experiments show that the improved clustering analysis algorithm’s clustering accuracy is much higher than the algorithm before the improvement. After that, the time spent by the two clustering algorithms was compared and analysed, and it was discovered that the traditional clustering analysis took longer than the improved clustering analysis. It can be concluded that the improved clustering analysis algorithm outperforms the traditional clustering analysis algorithm not only in terms of clustering accuracy but also in terms of clustering efficiency. As a result, choosing cluster analysis to construct the informatization of physical education resources is beneficial to the classification and integration of information.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors do not have any possible conflicts of interest.