Using Spectral Clustering Association Algorithm upon Teaching Big Data for Precise Education

Zhou, Yongfu; Zeng, Zhi; Wang, Huabin

doi:https://doi.org/10.1155/2022/7214659

Mathematical Problems in Engineering

On this page

Abstract Introduction Conclusions Data Availability Disclosure Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Theory and Application of Swarm Intelligence and Machine Learning

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 7214659 | https://doi.org/10.1155/2022/7214659

Using Spectral Clustering Association Algorithm upon Teaching Big Data for Precise Education

Yongfu Zhou,¹Zhi Zeng,²and Huabin Wang²

Academic Editor: jianguo duan

Received22 Jun 2022

Accepted20 Aug 2022

Published26 Sept 2022

Abstract

With the continuous deepening of the application of educational OA, massive educational data has been produced. Hence, the application of teaching big data (TBD) has a certain theoretical basis, practical methods, and research methods in the field of education. How to fully play the leading role of education on TBD in professional education, guide and recommend students to carry out personalized learning, change the teaching mode, enrich the teaching evaluation, then further improve the quality of talent training is a current issue, which has yet to be solved. Based on the analysis and mining of big data, this paper uses the spectral clustering algorithm to construct the curriculum association classification model and realizes the clustering of core courses. Then, through the analysis of the academic achievements of previous professional core courses, we can master the current situation of students' learning, construct the model as the portrait of students' individual, curriculum, and professional characteristics through deep learning, so as to realize the precise referral of personalized learning courses, provide students with targeted academic guidance, and further dynamically adjust the teaching syllabus, including the teaching methods and teaching means. Vice versa, we can improve the core courses clustering to further feedback on the curriculum association classification model by analyzing the job position technical requirements. Experiments show that the proposed model using a spectrum clustering algorithm could be better provided strong technical support for the decision-making of precision education in colleges and universities.

1. Introduction

Nowadays, with the wide spread and application of the big data concept, its connotation is constantly evolving and extending. People are gradually realizing that big data is not only a kind of technology but also a kind of ability, such as the to find meaningful associations from massive and complex data, mine the changing rules of things, and accurately predicting the development trend of things. In addition, big data is also a way of thinking; it can let data speak, and let data become the basic starting point of human thinking and decision-making behavior. However, TBD refers to the data set generated in the whole process of educational activities and collected according to the needs of education, which can be used for the development of education and create great potential value [1]. TBD has a clear goal orientation, that is, to the development of education. It can play an effective role in improving the quality of education, promoting educational equity, realizing personalized learning, optimizing the allocation of educational resources, and assisting scientific decision-making of education [2, 3]. The ultimate value of big data should be reflected in the deep integration with the mainstream business of education and the continuous promotion of the intelligent reform of the education system [4, 5]. At present, there are some innovative application cases of education big data at home and abroad, covering teaching, management, evaluation, service, and other fields [6–8]. With the support of TBD, upon the mining and analysis of students' teaching data, we can more accurately predict learners' action trajectory, and then, effectively support students' personalized adaptive learning, which is the key to realize precision education [9]. As early as the 1990s, some scholars studied the referral service. With the continuous development and improvement of the referral algorithm, Resnick, and others proposed the concept of personalized referral, and personalized referral service is respected [10]. It was first used in the commercial field, mainly represented by American business giants such as Amazon and Walmart. It collects relevant data from users, calculates the similarity of various goods, and connects users with goods through the referral algorithm, which greatly improves the personalized and commercial level of the referral service. With the continuous application of computer technology, there are many information referral systems in different fields, such as web watcher, lira, Letizia, and other personalized referral systems [11]. The application of personalized referral services in the field of education has gradually developed. For example, the e-learning system supported by online course resources [12], or the learning resources provided according to the dynamic progress of students [13], and the e-book Package based on the information of learners' personal, preference, academic, performance, and so on [14]. TBD covers the knowledge of educational data mining and learning analytics [15, 16]. However, in precision teaching, in addition to promotion services, the learning effects are also a part that we must pay attention to, so the early warning of academic performance based on teaching big data is also the focus of this research. When it comes to academic achievement early warning, is naturally related to data mining and analysis. TBD [17], whether it involves students or teaching managers, should focus on finding hidden valuable information [18], so as to provide effective guidance for future teaching activities, which is of great significance. Academic early warning is an important means for colleges and universities to strengthen students' learning management and improve the level of education and teaching management. Similarly, in the 1990s, many foreign universities used students' academic achievement as the data source for academic early warning [19]. In China, it started relatively late to predict and explain the learning performance with academic risk through the first semester scores and college entrance examination scores, so as to help implement academic early warning [20]. However, the mechanism and method of early warning are relatively simple, and once problems are found, we are unable to solve them timely with an effective response [21]. Therefore, based on the score data of software majors in a certain school, this paper focuses on the three aspects of teaching quality and establishes the school’s running orientation and quality objectives in line with the actual situation of the school in order to meet the needs of economic and social development and students' personal development [22].

Specifically, implementing precise teaching referral, for students, mainly includes precise referral of students' choosen courses and precise warning of great academic fluctuations. The core of precise teaching referral is a precise prediction of student-course achievement to get professional referral courses. Through the multidimensional teaching quality evaluation model, we can adjust the professional courses. Considering that the student courses’ score is a comprehensive problem involving all aspects, in which the more complex curriculum factors include: syllabus, teaching focus, and assessment difficulties, etc., and some other accidental factors, such as whether the examination is good or not. Therefore, it is a challenging issue to precisely predict student-course achievement and professional course referral. In the process of solving this problem, this paper focuses on solving the following key technical problems, as shown in Figure 1: refining the professional portrait, core curriculum portrait, and students' personal portrait; constructing a curriculum association classification model to complete the correlation measurement between various professional courses and job positions; modeling the similarity of student achievement distribution; and multidimensional constructing and improving precise courses referral. On the whole, the work of this paper includes:(1)To construct a frequent pattern spectral clustering classification model of professional core courses.(2)To study the potential relationship between professional courses, the characteristic model of students' individual, curriculum, and a professional portrait, and to use the deep learning method to train the same professional achievements in the past, so as to realize the prediction of the related curriculum achievements between the lower grade and the higher grade. We should give early warning of the courses that may be difficult and guide teachers to provide targeted guidance to students.(3)To build a multidimensional precise referral of teaching course, dynamic adjustment to guide teachers in the adjustment of the syllabus, teaching methods, and teaching means.

2. Theoretical Methods

2.1. Spectral Clustering Algorithm

Spectral clustering is a dimensionality reduction clustering method based on graph theory, which has a smaller calculation amount, is easy to implement, and is good at processing high-dimensional data [23]. Generally speaking, the difficulty of extracting professional portraits from the syllabus and training plan with complex structure lies in the description of the complex structure of the professional training plan, especially from the professional plan to the curriculum. We can use frequent pattern spectral clustering to classify courses and determine the weight proportion of professional courses. According to the feedback effect of the teaching quality evaluation, the professional courses should be adjusted appropriately. Therefore, we must further obtain the job skills required by the Internet through crawlers and determine the curriculum system to support such skills. The spectral clustering algorithm is used to construct the relationship between job position skills and professional core courses.

Spectral clustering (SC) is a clustering method based on graph theory, which divides the weighted undirected graphs into two or more optimal subgraphs to make the subgraphs as similar as possible and the distance between subgraphs as far as possible, so as to achieve the purpose of common clustering. Among them, the best means that the optimal objectives function is different. It can be the smallest cut and the best cut of the cut edge, such as the normalized cut in Figure 2.

In this way, spectral clustering can identify any shape of sample space and converge to the global optimal solution. The basic idea is to use the feature vectors obtained from the feature decomposition of the sample data similarity matrix (Laplace matrix) to cluster. For the space vector and item-user matrix, see Table 1.

Now, according to Figure 3(a), we can calculate the similarity between items, and then we can get a similarity matrix with only items. Furthermore, we regard items as vertex (V) in the graph (G), and the similarity between items as an edge (E) in . In this way, we can get our common concept of the graph. Then we get the adjacency matrix e as shown in Figure 3(b), where eij represents the weights of VI and the edges of VI, e is a symmetric matrix, and the elements on the diagonal are 0. Thus, the Laplacian matrix as shown in Figure 3(c) is obtained: l = D–e, where di (the sum of row or column elements).

(a)

(b)

(c)

First, consider an optimal image segmentation method. Take bisection as an example, divide the graph into S and T parts, which is equivalent to the following loss function cut (s, t), as shown in formula (1), that is, the minimum (weighted sum of cut edges).

Suppose that the two categories are divided, S and T, and Q as shown in formula (2) is used to represent the classification, and Q satisfies the relationship of formula (3) used for class identification.where,Where D is the sum of row or column elements in the diagonal matrix, and L is the Laplacian matrix. From formula (5):

We have,(a)L is a symmetric positive semidefinite matrix, which guarantees that all eigenvalues are greater than or equal to 0;(b)L-matrix has a unique 0 eigenvalue and its corresponding eigenvector is 1.

2.2. Similarity Calculation of Professional Core Courses

According to PCC (Pearson’s correlation coefficient), within the given threshold range, the calculation of PCC between previous students' academic achievement and current students' academic achievement is to realize the curriculum referral and early warning of academic achievement, as well as the job referral. PCC is widely used to measure the degree of correlation between two variables, and its value ranges from -1 to 1. It evolved from a similar but slightly different idea, put forward by Francis Galton in the 1880s by Carl Pearson. This correlation coefficient is also called the Pearson product-moment correlation coefficient.

PCC between two variables is defined as the quotient of covariance and standard deviation between two variables. We have,

The above formula defines the overall correlation coefficient, which is usually represented by the Greek small letter . PCC can be obtained by estimating the covariance and standard deviation of the sample, which is usually expressed by the letter .

Similarly, it can be estimated from the mean value of the standard fraction of (x, y) sample points to obtain the equivalent expression of the above formula.Where , and are the standard fraction of sample points , the mean value and standard deviation, respectively.

3. Proposed Models

Considering that the spectral clustering algorithm is the method of dimensional reduction, it is more suitable for high-dimensional data processing and does not need to consider the shape of the sample space. The algorithm only needs to calculate the similarity matrix between the data sets to achieve clustering. Compared with the traditional clustering method, it is more efficient in dealing with the high-dimensional and sparse matrix of university curriculum scores.

In order to realize the spectral clustering algorithm of course grades, first, the undirected weight graph based on similarity metrics is constructed. Then the graph is divided into different subgraphs according to the trimming rules, and finally to realize clustering. The clustering algorithm is described as follows:

4. Testing and Experiments

In Python, we developed a big data visual platform for the overall overview of the college of computer science, and our experiments are conducted on a server with an Intel Core5i@1600HZ and 16 GB of memory. The platform is as shown in Figures 4(a)–4(c), which includes student portraits, teacher portraits, professional portraits, and so on. Also we took the academic achievement from TBD as an example to analyze the personalized learning, to implement precise teaching referral and precise warning of great academic fluctuations. Therefore, we studied the relationship between student attributes and job positions, and mined the association rules of student groups, namely feature vectors [24]. Herein, let the feature vectors are the basis of student clustering. According to the attribute value of historic achievement, assume that the E.Score or Re.score divided into 5 levels, more than 90 is ‘A’, range from 80 to 90 is ‘B’, 70 to 80 is ‘C’, 60 to 70 is ‘D’, and less than 60 is ‘E’. We generalize the values of each attribute for historic achievement, as shown in Table 2, technical capability on job position got from web mining in Table 3, and a sample of employment historic data in Table 4.

(a)

(b)

(c)

There are two kinds of courses: compulsory and optional. More specifically, compulsory courses include some professional basic courses and professional core courses. Therefore, as far as computer-related professional majors are concerned, job skills needs should pay more attention to the professional core courses. Now, assuming that the core courses of job requirements have been mined from the external recruitment website, we can calculate the correlation degree between the job and these courses by PCC according to the professional core courses. In this way, we can regard each course as a vertex and the correlation between courses as weighted edges. Spectral clustering uses the graph segmentation method to divide the weighted undirected graph of all courses into several optimal subgraphs so that the weight of edges connecting different courses is as low as possible, and the weight of edges connecting the same courses is as high as possible. Table 5 is an example of a core curriculum set for technical capability in a job positions. Therefore, let the min-support threshold and credibility be 0.1, respectively, and the undirected graph of spectral clustering is built as Figure 5.

From Figure 4, we can see the clustering results of professional core courses are constructed by the spectral clustering algorithm. Now, taking the above Table 2 as an example, there are four types of job positions, which need to possess the related professional courses. To calculate the distribution between job positions and the professional courses by formula (1), we adjusted the core courses of the major specified in the training program, such as increasing or reducing the existing courses.

Among them, represents the distribution of professional courses. Figure 6 describes the distribution of job positions’ intervals, and courses are divided into first intervals, indicating that such courses mainly belong to the first type of job positions, and so on.

For the purpose of validation of the professional core courses, we can calculate the correlation by PCCs for different job positions, choose the maximum value for the core courses according to the final result, and subsequently adjust the professional courses for the training program. Figure 7 shows the referral process for professional core courses for learners.

Generally speaking, we can predict the course performance of the new students and recommend the course to realize personalized learning or calculate the pass rate of the course according to the grade performance of the previous students, so as to realize the adjustment of teachers or syllabus.

In order to qualitatively analyze the similarity between students, due to the uncertainty of the two group data dimensions, PCC can be calculated according to the historical scores of a student in the last term and the current students' scores to measure the similarity of their learning characteristics.

In general, to use the Pearson correlation coefficient, it is necessary to determine whether two variables are linearly correlated. In other words, the Pearson correlation coefficient can explain the degree of linear correlation between variables. If it is not clear whether it is a linear relationship, PCC is meaningless. If PCC equals 0, it is unable to show that the two variables are not related, and it may meet a more complex relationship. Therefore, before using the Pearson correlation coefficient, it is suggested to draw a scatter diagram to roughly judge whether the overall linear relationship between the two variables is reflected.

Assume that the scores of six major core courses are extracted, the two groups of data are calculated first, and their linear correlation is then obtained. Then, the PCC is calculated. The value range is [−1,1], the greater the absolute value of the correlation coefficient, the more significant the correlation is, and the positive and negative correlation are respectively expressed by positive and negative values.

From Table 6, we can see that most data do not satisfy a linear relationship, but we still should calculate the correlation coefficient. All individuals are used to calculate the overall Pearson correlation coefficient directly. We can use Matlab to analyze and summarize the statistics into a matrix. Figure 8 shows the distribution trend of scores and employment between a previous and newer student. The Std.D with Avg scores could satisfy the function relationship of a normal distribution with courses therefore it is in the line with the expectations of the teaching law, see Figure 8(a). Careful observation shows that there is a certain linear relationship between the average score and the correlation rate of professional employment, see Figure 8(b).

(a)

(b)

The following matrix reflects the PCC for students’ scores between ranks.

Through PPC calculation, we can predict professional scores with previous students’ scores as individual students, course referrals, and job positions for every portrait model. Figure 8 shows the precision and score of prediction on individual-class. According to the previous student’s scores, the current score of every course is the same as prediction, see Figure 9(a). Also, we can see the precision of the individual and average score in a class for one course keep the high level, see Figure 9(b). According to the scores, the prediction of professional consistency for the position was seen in Figure 9(c). The main reason is that professional technology is decided by many courses, especially since the technical content is relatively high, such as the course on network security.

(a)

(b)

(c)

5. Conclusions

To establish the internal quality management platform in colleges or universities, implementing teaching quality evaluation and rectification systems helps meet the practical needs of the separation of management, operation and evaluation, and the transformation of functions of educational administrative departments. It also helps meets the need for vocational education to actively adapt to the new economic development, independently guarantee quality, and enhance core competitiveness. Therefore, the analysis of the data of graduating and employed graduates can directly reflect whether the teaching, decision-making, and management can meet the skill requirements of the new era.

In general, the evaluation and improvement of the teaching quality assurance system must adhere to the concept of total quality management, focus on playing the main role of quality, implement all staff, all process, and all factor evaluation and improvement, integrate the preplanning, in-process monitoring, and postimprovement, and establish a normalized information feedback analysis and improvement mechanism. Finally, the quality assurance system and working mechanism of three-year complete education programed are established.

In future work, we will devote time to improve the predict accuracy and efficiency, optimize the curriculum association classification model. Moreover, there are still several unsolved problems. The key point of the predicted accuracy mentioned here is that it is related to the accuracy of the portraits of students’ individuals, curriculum and professional portraits, and the similarity of technical requirements of positions mined from the Internet. Of course, besides that, the accuracy of the proposed model is also related to the continuous correction of the feedback model. Hence, the universal consistency between the individual and the whole for precision teaching service is a direction of our efforts in the future.

Data Availability

The data used to support the findings of this study have not been made available because of student and teachers’ private datasets including their ID and achievements.

Disclosure

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was partly funded by the project of Guangdong Provincial Science and Technology department (No. KTP20200292), the Science and Technology Project of Huizhou City of Guangdong province (No.2016X0423038), the Teaching Reform Project of Huizhou University (No. SZJG2020062), and the Research Center of Big Data Application Engineering Technology of Heyuan City (2017, No.74).

References

X. M. Yang, L. H. Wang, and S. S. Tang, “Application mode and policy suggestions of education big data,” E-Education Research, vol. 9, pp. 54–61, 2015.
View at: Google Scholar
D. M. West, Big Data for Education: Data Mining, Data Analytics, and Web Dashboards, Governance Studies At Brookings, Brookings Institution, 2012.
X. M. Yang, S. S. Tang, and J. H. Li, “Technical system framework and development trend of education big data——overall framework of column on research and practice of education big data,” Modern Educational Technology, vol. 26, no. 01, pp. 5–12, 2016.
View at: Google Scholar
A. Manohar, Utilizing Big Data Analytics to Improve Education, 2016.
B. P. Sigman, Teaching Big Data: Experiences, Lessons Learned, and Future Directions, decision line, 2014.
J. Liang, Y. Jian, and Y. Wu, “Big Data Application in Education: Dropout Prediction in Edx MOOCs,” in Proceedings of the IEEE Second International Conference On Multimedia Big Data, IEEE, Taipei, Taiwan, August 2016.
View at: Google Scholar
X. Yu and W. Shuang, Typical Applications of Big Data in Education, IEEE, Educational Innovation Through Technology, 2016.
M. M. Alani, Applications of Educational Data Mining and Learning Analytics Tools in Handling Big Data in Higher Education, 2018.
J. Jiang and L. Zeng, “Research on Individualized Teaching Based on Big Data Mining,” in Proceedings of the 2019 14th International Conference On Computer Science & Education (ICCSE), IEEE, Toronto, ON, Canada, August 2019.
View at: Google Scholar
S. Dwivedi and V. Roshni, “Recommender System for Big Data in Education,” E-learning & E-learning Technologies, IEEE, pp. 1–4, 2017.
View at: Google Scholar
D. Goldberg, D. Nichols, B. M. Oki, and D. Terry, “Using collaborative filtering to weave an information tapestry,” Communications of the ACM, vol. 35, no. 12, pp. 61–70, 1992.
View at: Publisher Site | Google Scholar
W. Zhang, Design and Implementation of Personalized Recommendation System for Learning Resources Based on Collaborative Filtering Algorithm, ieee, Tianjin Normal University, pp. 2–6, 2017.
J. Y. K. Yau and M. Joy, “A context-aware personalised m-learning application based on m-learning preferences,” International Journal of Mobile Learning and Organisation, vol. 5, no. 1, pp. 1–14, 2011.
View at: Publisher Site | Google Scholar
C. B. Yao, “Constructing a user-friendly and smart ubiquitous personalized learning environment by using a context-aware mechanism,” IEEE Transactions on Learning Technologies, vol. 10, no. 1, pp. 104–114, 2017.
View at: Publisher Site | Google Scholar
L. Ji, X. Zhang, and L. Zhang, “Research on the Algorithm of Education Data Mining Based on Big Data,” in Proceedings of the 2020 IEEE 2nd International Conference on Computer Science and Educational Informatization (CSEI), IEEE, Xiamen, China, March 2021.
View at: Google Scholar
C. Fischer, Z. A. Pardos, R. S. Baker et al., “Mining big data in education: affordances and challenges,” Review of Research in Education, vol. 44, no. 1, pp. 130–160, 2020.
View at: Publisher Site | Google Scholar
S. K. Mohamad and Z. Tasir, “Educational data mining: a review,” Procedia - Social and Behavioral Sciences, vol. 97, pp. 320–324, 2013.
View at: Publisher Site | Google Scholar
H. Barwick, The four Vs of Big Data Implementing Information Infrastructure Symposium [EB/OL], [2012-10-02].
T. W. Zhang, T. Xie, and ZH. M. Li, Research on Campus Traffic Management in Universities, University Logistics Research, Behrends, no. 9, pp. 56–58, 2017.
X. Wu and T. Sun, “Research on College Students' academic early warning based on educational big data mining,” China's educational informatization, no. 007, pp. 55–57, 2020.
View at: Google Scholar
Knewton [EB/OL], 2020.
Y. H. Hu, C. L. Lo, and S. P. Shih, “Developing early warning systems to predict students' online learning performance,” Computers in Human Behavior, vol. 36, no. 36, pp. 469–478, 2014.
View at: Publisher Site | Google Scholar
I. S. Dhillon, Y. Guan, and B. Kulis, “Kernel k-means: spectral clustering and normalized cuts//KDD’04,” in Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data mining, Seattle, USA, pp. 551–556, New York: ACM, 2004, August 2004.
View at: Google Scholar
A. Peterson, “Big data in education: new efficiencies for recruitment, learning, and retention of students and donors,” Handbook of Statistical Analysis and Data Mining Applications, pp. 259–277, 2018.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Yongfu Zhou et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies