Abstract

With the rapid development of network technology and database technology, computers have been able to store large-scale and massive data. On the other hand, traditional data analysis and processing tools such as management information system can only process these data on the surface, but the deeper data analysis ability is not satisfactory. The contradiction between data supply ability and data analysis ability is becoming more and more prominent, so there is an urgent need for an automation technology that can deeply process data. Data mining technology came into being. Cluster analysis, as an important topic in data mining, is a data mining method that divides data into natural groups and gives the description of the characteristics of each group. It is a basic method of data mining and knowledge discovery. Cluster analysis is a data mining technology for unsupervised classification of data without prior knowledge and guidance. Through the appropriate use of advanced algorithms, it can explore the hidden valuable information, improve the quality of data analysis and interpretation, and provide a scientific judgment basis for the reprocessing or understanding of data by other data analysis and sorting tools. First, this paper briefly introduces the principle, development, and methods of cluster analysis and expounds the application of cluster analysis. Then it expounds the principle of R-means clustering algorithm, analyzes the advantages and disadvantages of basic R-means clustering algorithm, and expounds several existing improvement methods. An improved R-means clustering algorithm and a clustering analysis model based on R-means clustering algorithm are proposed, and the corresponding algorithm flow and implementation are given.

1. Introduction

Human society has entered a period of high-speed civilization. Especially since the 1990s, information technology and Internet technology, people have accumulated very rich data in various fields of production and life. These massive data not only promote the development of database technology but also make it easy for people to obtain a large amount of data. However, on the other hand, in the face of large-scale massive data, people are no longer satisfied with data query and processing. They urgently hope to further analyze these data, find out the correlation between the data, and put forward useful information to help people make decision analysis and scientific research. To meet these requirements, traditional database technology is obviously powerless. Therefore, people urgently need a technology and tool that can intelligently and automatically convert data into useful information and knowledge. In this way, data mining technology came into being and gradually became a research hot spot in the field of computer science, attracted many experts and scholars, and showed strong vitality. As an important subject in data mining technology, cluster analysis has attracted more and more attention of researchers in recent years. It is a data mining method that divides data into natural groups and gives the description of the characteristics of each group. It is a basic method of data mining and knowledge discovery. Measuring the effect of government activities in the public sector is a concept including multiple objectives. In the research at home and abroad, productivity, quantity, proportion, effect, and value are often used as indicators to measure performance [15].

Data mining is a process of mining potential and useful information and knowledge from a large number of irregular and disordered data. It is also called data mining or data mining. It covers a wide range of fields, including artificial intelligence, statistical technology, database, and other related technical fields. As a multidisciplinary field, the commonly used technologies of data mining include decision tree, association rules, neural network, cluster analysis, classification, prediction, visualization, and so on. Among them, cluster analysis (hereinafter referred to as “clustering”) is one of the most widely used and mature data mining technologies. It is an effective technical method for people to explore and extract the internal relationship between things. Its main function is to divide the data set into several different groups according to certain rules so that the data objects in the same group are as similar as possible. The data objects in different groups are different as much as possible. The similarity between data objects is calculated by describing the attributes of objects. Different from classification, clustering is a typical unsupervised learning process, which does not need to know the object characteristics in advance, so it is widely used in the data preprocessing process. In addition, clustering has important applications in machine learning, spatial data analysis, pattern recognition, business decision-making, image processing, web document classification, and data compression [612].

The research on clustering has lasted for decades, during which clustering algorithms emerge in endlessly. Generally speaking, the main basic clustering algorithms can be divided into the following categories: partition-based methods, hierarchy-based methods, density-based methods, grid-based methods, and model-based methods. K-means clustering algorithm is one of the most commonly used typical partition-based algorithms, which uses the sum of error square criterion function as the clustering criterion. It has the advantages of simple operation, fast, efficient, and scalable processing of large data sets, but the algorithm also has the following defects: the clustering results are sensitive to the selection of the initial center value, the K value in the algorithm needs to be specified in advance, it is easy to fall into the local optimal solution, and only spherical clusters can be found. So far, a large number of literatures have put forward many effective specific improvement measures for the above problems, but they all have some limitations. Therefore, it is still of great significance to continue to improve and improve the K-means clustering algorithm [1316].

For initialization sensitive issues, Kaufman et al. proposed a heuristic method to select the initial clustering center of K-means by estimating the density of data samples. Steinbach et al. proved that k-means algorithm has a good clustering effect, and the clustering time is linear with the number. Yao Minli and others regard the cluster center set as population particles, introduce GSA to search the initial cluster center with the best clustering quality, and propose an improved k-means clustering based on GSA algorithm. Duan Longzhen et al. proposed an improved density peak clustering algorithm combined with genetic K-means to solve the problem of cluster misclassification caused by manual selection of cluster center when processing multidensity peak data sets by density peak clustering algorithm [17]. In Zhang et al. [18], development of cognitive computing and three-way decision-making makes it possible to deeply understand sequential patterns through the temporal correlation analysis. The main challenge is to obtain concise models that express the rich semantics of multiple time-series (MTS) analysis. In Wu et al. [19], a new dynamic infinite hybrid model with self-definable step size is proposed to deduce the eeg fatigue signal dynamics. The instantaneous spectral features provided by integrated wavelet transform and Hilbert transform were extracted to form four fatigue indexes. Li et al. [20] proposed a spherical K-means clustering method. Its purpose is to divide a given point of unit length into k sets to minimize the intracluster sum of cosine dissimilarity. Sinaga and Yang [21] proposed a new unsupervised K-means (U-K-means) clustering algorithm, which can automatically find the optimal number of clusters without giving any initialization and parameter selection. The computational complexity of U-K-means clustering algorithm is analyzed. It is compared with the existing U-K mean method. Lakshmi and Baskar [22] proposed a new initial centroid selection method of K-means document clustering algorithm, namely, DIC doc-K-means initial centroid selection based on dissimilarity, to improve the performance of text document clustering.

As shown in Figure 1, select the number of cluster centers. The randomness of K-means clustering algorithm in the selection of the number of initial cluster centers affects the clustering effect. When the number of clusters is not appropriate and the selected points are isolated points or noise points, it is often prone to local optimal solutions, which affects the effect of clustering. In order to solve the problem of the number of cluster centers, Sohil et al. proposed a robust clustering algorithm (robust). Continuous clustering (RCC), which does not need to know the number of clusters in advance, calculates the number of cluster centers by iteratively updating the objective function, and achieves good results in experiments, which greatly improves the clustering performance. Some scholars also determine the number of clusters according to the effectiveness index of clustering, such as CH index, DB index, and KL index [2330].

On high-dimensional nonlinear separable data, due to the diversity and high dimension of current data, it is difficult for general clustering algorithms to obtain good experimental results on high-dimensional data sets. The eigenvalue decomposition method is applied to solve the approximate solution of relaxation problem, but the solution obtained by eigenvalue decomposition may seriously deviate from the real solution in some cases. Based on clustering analysis, the data are divided into natural group and characteristic description of each group. Cluster analysis is a kind of data mining technology, and through the appropriate use of the advanced algorithm, it can dig the hidden valuable information, which helps us to improve the quality of the data analysis and interpretation, Q:

2. Introduction to Relevant Theories

Clustering is the process of dividing data samples into several subsets. Definition of clustering: the data in the same cluster are similar, but the data in different clusters are not similar. Cluster is the convergence of points in the test space. The distance between data points in the same cluster is smaller than that between any two data points in different clusters. The basic idea of clustering is to divide the data objects according to the similarity of attributes, divide the data objects with high similarity into the same cluster, and divide the data with low similarity into different clusters. Researchers observe the actual distribution of data objects through the final division results.

The main idea of R-means clustering algorithm is to map the original data set to the high-dimensional space to make it linearly separable in the high-dimensional space, which effectively alleviates the problem of nonlinear separability of data. However, these algorithms often have some shortcomings: first, the selection of functions is a difficult problem at present. Due to the wide variety of functions, it is more important to select appropriate functions. Second, the complexity of the existing clustering algorithm is high, and it is easy to fall into the local optimal solution in the solution process. Finally, in order to improve the performance of clustering algorithms, most proposed clustering algorithms have more model constraints and more parameters. Most model parameters are three parameters or more, but parameter adjustment is a complex process, and many models are highly sensitive to parameters. For most data sets, in order to achieve good clustering effect, it takes a lot of time and labor cost to adjust the parameters.

By selecting the appropriate function and using the nonlinear mapping ability of the function, while improving the clustering performance, the high-dimensional nonlinear separable data become linearly separable after being mapped to the space. In this process, due to the nonconvexity of the proposed model, it is often easy to fall into the local optimal solution in the process of solving. The global optimal solution is obtained by alternating iterations. The model is verified by a large number of experiments.

As an unsupervised learning method, clustering algorithm has been developed rapidly. Before partition, it is not necessary to give a specific data label in advance, and the data are automatically divided into different clusters through the characteristic attributes of the data. Different clustering algorithms are suitable for different data types. In the process of practical use, the corresponding clustering algorithm needs to be selected according to the type of data set.

On the issue of clustering, there is no way to intuitively express the division of clustering. In different divisions, the class mark only plays the role of an indicator and has no practical significance [3138].

Given a data set,where n represents the number of data samples, m represents the dimension of data, which is preclustered into k disjoint clusters , let us is the clustering center. The purpose of R-means algorithm is to find a reasonable division and make the square error between data points and clustering centers as small as possible by choosing Euclidean distance as the judgment criterion.

The aim of mean algorithm is to find a reasonable division and make the square error between data point and cluster center as small as possible by choosing Euclidean distance as the judgment criterion. The objective function of the mean value algorithm is as follows:where represents any sample and represents the clustering center of the K class.

Among them

The goal of the mean algorithm is to minimize the sum of square errors on k clusters. The smaller the value of the objective function is, the higher the similarity of samples in the cluster is.

The objective function of mean can be rewritten as

When the orthogonal constraint of formula (4) holds, the objective function of the mean can be written in the following form:

Figure 2 shows some simulation data. The data in the above figure show the clustering process of the mean algorithm as an example. First assume that the number of clusters is 3, and the clustering process is shown in Figure 3. The square in the figure represents the center of the cluster. Each figure represents the clustering results after iteration. The clustering center changes after each iteration. When the center of the cluster, that is, the mean vector, no longer changes or reaches the maximum number of iterations, the clustering algorithm ends here.

Kernel mean clustering algorithm is an improvement of the traditional partition-based mean algorithm. Through a mapping function, the data set is transformed into the data style that can be received by the standard mean algorithm, and then processed by the mean algorithm.

The main idea is to select the appropriate kernel function, map the data points in the input space to the high-dimensional space, highlight the characteristic differences between sample categories, make the samples linearly separable in the kernel space, and then cluster them.

The objective function of nucleation mean is as follows:

Spectral clustering algorithm is one of the classical clustering algorithms. Its essence is to transform the clustering problem into the optimal partition problem of the graph. Nonnegative spectral clustering is an extended algorithm of spectral clustering algorithm. By embedding the spectrum and adding nonnegative constraints, the indicator matrix can be directly obtained to obtain the clustering results. Its objective function is as follows:

The updating rule of NSC clustering method is as follows:

At present, nonnegative matrix decomposition technology is widely used in clustering algorithms. Orthogonal nonnegative matrix algorithm is an extended algorithm of nonnegative decomposition matrix algorithm. It adds an orthogonal constraint on the nonnegative matrix decomposition algorithm. Its objective function can be written as

Update rules of ONMF algorithm are as follows:

The first derivative of the objective function to X is as follows:

The optimization problem of ONMF objective function can be written as is the sample set; is the new sample set; is the new sample training set; and is the processing intersection.

Fitness function, as an important index to measure individual performance, is the main basis for survival of the fittest in the process of genetic evolution. In this paper, the selection of fitness function is not only related to the quality and quantity of the next-generation population but also directly affects the learning of optimal K value. The fitness function is defined as follows:where is the minimum class spacing, and is the average class spacing:where P is the probability of individual being selected, fitness values of individuals.

3. Data Acquisition

With the development of the Internet, the amount of network information increases rapidly. In order to help people select useful information efficiently and quickly in the case of information overload, clustering algorithm came into being and has made great progress. The traditional clustering algorithm solves the problem of information overload to a certain extent, especially the classical mean algorithm, which has the advantages of simple operation, fast time, and high efficiency on small low-dimensional data sets. However, it cannot achieve ideal results in high-dimensional, nonlinear, and separable data. Therefore, the emergence of kernel mean algorithm alleviates this problem to a certain extent. The main idea of kernel mean clustering algorithm is to map the original data set to high-dimensional space to make it linearly separable in high-dimensional space, which effectively alleviates the problem of nonlinear separability of data.

At present, the research on clustering algorithm mostly focuses on its novel methods, while ignoring some classical and simple algorithms. Kernel mean algorithm is a hot issue in the research of high-dimensional data, but these algorithms often have some shortcomings: first, the selection of kernel function is a difficult problem at present. Due to the variety of kernel functions, it is more important to select an appropriate kernel function. Second, the complexity of the existing kernel-based clustering algorithms is high, and it is easy to fall into the local optimal solution in the process of solving. Finally, in order to improve the performance of clustering algorithms, most proposed clustering algorithms have more model constraints and more parameters. Most model parameters are three parameters or more, but parameter adjustment is a complex process, and many models are highly sensitive to parameters. For most data sets, in order to achieve good clustering effect, it takes a lot of time and labor cost to adjust the parameters.

In the study of the problems existing in the human resource management system of H Group, questionnaires are mainly used to select the in-service employees of H Group’s science and technology logistics, tourism, capital, and industry groups. In the process of questionnaire analysis of H Group’s human resource management system, basic literature research, employee interview, questionnaire, and network survey are mainly adopted. For the questionnaire investigation method, the main on the basis of human resources management-related theory concludes the analysis framework of human resource management system and the specific content of the questionnaire and individual interview is for the questionnaire design and survey questionnaire modified to provide information, questionnaire by H group of human resource management department personnel to assist to collect data and information. Then, statistical software SPSSl9.0 was used for statistical analysis of the survey results. On the basis of validity and reliability analysis of the questionnaire, KMO and Bartlett tests were carried out on the questionnaire. Then, major component analysis and factor analysis were conducted on the data to extract the key reasons, leading to the problems in the human resource management system of H Group.

In order to realize the early warning function of the human resource performance management information system, the evaluation system of the early warning index should be the subsystem of the performance appraisal index system, which is the decomposition index of the performance appraisal index. The establishment of the early warning index system in the stage of performance planning is the basis to realize the early warning function. Second, reasonable processing methods of early warning data are needed; that is, reasonable processing models of early warning data are set, such as binary logic model, fuzzy judgment model, etc., and processing models are selected according to the types of relevant indicators, so as to judge the current performance level through the early warning data processing model. Finally, the ultimate purpose of the early warning function is to ensure the completion of performance objectives. When abnormal performance early warning occurs, the system will analyze the warning results and formulate countermeasures to correct performance errors for employees and managers in Figure 4.

In the human resources performance management information system using big data, employees can view their own performance data in real time. For the performance appraisal results, they can communicate directly with the human resources performance team without using the manager as the media. This has a significant effect on improving the internal operation efficiency of the company.

For managers, they finally get rid of the low-level role of “communication media.” At the same time, the performance result analysis and relevant decision-making suggestions provided by the human resources performance management information system will greatly simplify the work of department managers and improve the efficiency of their management departments. For employees, they can query their performance data in real time. In case of any objection to the performance evaluation results and handling opinions, you can directly communicate with the performance team, so as to ensure their individual interests to the greatest extent.

4. Construction and Composition of Management Information System

From a large number of experimental results, the effectiveness of the improved algorithm is verified. On most data sets, especially on some high-dimensional data sets, the algorithm proposed in this paper can achieve better clustering performance. In other words, the improved algorithm in this paper can also extract more useful information in the process of clustering, making the clustering results more accurate.

Figure 5 shows the comparison results of NSC method, ONMF method, and PKKM method in STRIKE data set, Korea data set, HIGHSCHOOL data set, POLBOOKS data set, FOOTBALL data set, and TERROR data set. The above data sets are all social domain data sets. Through observation, PKKM has achieved the best experimental results in all six data sets, and there is a significant improvement in the STRIKE data set and KOREA data set, indicating that the proposed method can also achieve ideal experimental results for some practical application data sets.

The mean clustering algorithm that randomly selects the initial clustering center has large fluctuation range and poor stability. In contrast, the minimum intraclass distance, the maximum intraclass distance, the difference between them and the average intraclass distance of the mean clustering algorithm based on differential evolution, and the improved algorithm proposed in this paper are significantly reduced. According to the nature of clustering, the smaller the intraclass distance is, the closer the data objects in the same cluster are, and the better the clustering quality is. The larger the inner distance, the looser the data objects in the same cluster, and the worse the clustering quality. Therefore, the experimental data verify that the mean clustering algorithm and improved algorithm based on differential evolution have made great progress in improving the stability and effectiveness of clustering results, and significantly improved the clustering quality. The results show that the clustering results and quality of the improved algorithm proposed in this paper are significantly better than the original mean clustering algorithm based on the standard differential evolution algorithm.

We randomly selected 500 samples from five years of data from 90 companies, and the remaining 100 samples were used as test sequences to test r-mean clustering. It can be seen that the r-mean clustering reaches convergence after 13 iterations, and the error decreases to the target value, and the network training ends. In the r-means clustering prediction, the sample regression is analysed.

The experimental results in Figures 68 show that the average convergence speed of the improved algorithm proposed in this paper is significantly faster than the mean clustering algorithm based on differential evolution on the test data set, which proves the effectiveness of the improved algorithm based on multimode evolution scheme and adaptive control parameters in improving the convergence speed of the algorithm and optimizing the global optimization.

5. Optimization of Human Resource Performance Management

The core problem of the company’s performance implementation is that the performance team processes the performance data manually through Excel office software, which cannot avoid the problems of low processing efficiency and lagging performance feedback results. The implementation of the human resource performance management information system using big data technology can greatly improve the processing efficiency without the interference of manual processing errors. It also improves the accuracy and recognition of performance results. At the same time, the application of human resource performance management information system to process performance data also solves the problem of long-term and high-intensity work pressure of performance team. In addition, the performance early warning function developed in the human resources performance management information system realizes the synchronous combination of the performance team and the business team by sharing the information flow, and the performance information is fed back to the business department in real time to realize the business department’s independent correction.

To realize the early warning function of human resources performance management information system, we first need the evaluation system of performance early warning. The evaluation system of early warning indicators should be a subsystem of the performance appraisal indicator system, which is the decomposition indicator of performance appraisal indicators. Second, a reasonable early warning data processing method is needed, that is, set a reasonable early warning data processing model, such as binary logic model and fuzzy judgment model, select the processing model according to the relevant index types, and then judge the current performance level through the early warning data processing model. Finally, the ultimate purpose of the early warning function is to ensure the completion of performance objectives. When the performance early warning is abnormal, the system will analyze the alarm source of the early warning results and formulate treatment countermeasures for employees and managers to correct performance errors in Figure 9.

As shown in Figure 10, the current situation and characteristics of the company’s human resources make the company have the following problems in human resource management.

The talent flow rate is high. This is mainly due to the lack of effective incentive mechanism. At the same time, as a market-oriented communication equipment manufacturing joint venture, the company is bound to adjust its organizational structure, market strategy, and product direction according to market fluctuations, which will lead to the flow of employees.

The talent team is relatively young. This has both advantages and disadvantages for enterprises: the advantage is that if it can be used correctly, it can enable enterprises to embark on the road of rapid development. However, if they feel that they cannot learn anything new in the organization, they are easy to change jobs and cause losses to the enterprise.

Lack of perfect salary system and incentive mechanism. This is mainly reflected in that the incentive content is too single, often based on money incentive, and there are few other spiritual incentives. Wages cannot reflect the difference between the values of different posts. There is no clear payment standard for benefit salary, which is generally paid together with the bonus at the end of the year. Usually, it is only a fixed level salary, which is not incentive enough.

Lack of standardized personnel training system. Generally, only shallow training is carried out, and there is little talent development training. Due to the lack of training in the company, new employees cannot quickly integrate into the enterprise, and the cohesion is poor; R & D personnel cannot keep up with the trend of technology development, and the technology is weakened. The marketing personnel do not understand the product situation, and the market development and service capacity are insufficient. The overall competitiveness of the company is weakening because many managers cannot keep up with their management skills after transferring from technical posts.

In view of the problems existing in human resource management, the company has made corresponding improvements in combination with its own actual situation. In each fiscal year, the human resources department and the finance department, together with each product group and department, will communicate with each department on post function adjustment, head count, and KPI assessment indicators detailed to each employee according to the business and functional objectives of the second year, so as to determine the final staffing and budget information, which will be implemented at the beginning of the second fiscal year.

Each secondary and tertiary department of the company has a human resources manager assigned by the company, who is responsible for providing support in daily work: recruiting employees, signing and managing labor contracts, handling procedures and induction training for new employees, determining employee compensation, implementing the company’s welfare policies, etc. together with department leaders, and fine-tuning the plan formulated at the beginning of the year according to the actual situation. More importantly, for the KPI assessment of employees, the human resources department works closely with all departments to implement the assessment as fairly and fairly as possible. At the same time, for the employees whose assessment is not ideal, work with the leaders of their departments to formulate a performance improvement plan to ensure the improvement of overall performance.

In order to attract more excellent talents, the human resources department has implemented the plan of “recommending talents with awards”: the human resources department regularly publishes the posts urgently needed by each department and encourages employees to recommend talents outside the company. If the recommended employee passes the application and successfully passes the probation period, the company will give a certain amount of material reward to the recommended employee. The implementation of this measure has greatly improved the enthusiasm of internal employees to recommend talents, ensured the quality of recommended talents, changed the difficult situation of recruiting talents in the past, and solved the talents required for important posts in the company in a short time.

At the same time, these HR managers are also an important bridge for communication between the company and ordinary employees: explain and communicate with employees on various policies, rules, and regulations of the company; solve various problems unrelated to business in employees’ daily work; and at the same time, report various problems, opinions, and suggestions fed back by employees to leaders at all levels.

6. Conclusion

(1)Combined with the kernel mean clustering algorithm, this paper proposes a nonconvex relaxation clustering model (pkkm) of the kernel mean clustering model and analyzes the relationship between the model and the orthogonal nonnegative matrix decomposition model, nonnegative spectral clustering, and other models. The proposed algorithm can quickly cluster high-dimensional nonlinear data and has achieved ideal experimental results in experiments.(2)Because human resource management is dynamic, the research results of this paper are based on the problems existing in the human resource management system of H group. Therefore, with the adjustment of H Group’s development strategy, the human resource management system should be dynamically adjusted to ensure that the human resource management system can meet the requirements of the group’s internal and external environment.(3)After putting forward the optimization scheme of human resource management system of H group, this paper puts forward the implementation and guarantees measures of the optimization scheme. However, the optimization scheme, implementation scheme, and safeguard measures proposed in this paper need to be tested by the practical work.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.