Abstract

The teaching management department carries all the work related to teaching in the whole school. A scientific, efficient, and complete teaching management system cannot only help the teaching management department improve work efficiency and quality but also greatly reduce many problems caused by manual labour risk. This paper designs and implements a teaching management system based on an improved association rule algorithm. First, aiming at the low efficiency of the Apriori algorithm for mining association rules, an association rule model based on interest is proposed. Second, use the MapReduce calculation model to partition the transaction database, then use the improved Apriori optimization algorithm for mining, and finally merge the mining results to obtain frequent itemsets. Through experiments, the optimized algorithm has greatly improved selection mining and computing time than traditional algorithms.

1. Introduction

The development of computer and information technology has brought rapid changes in various fields and has enabled higher education to gradually realize information [1, 2]. For the higher education system, teaching management is the core link, which has a great impact on the efficient operation of the entire teaching. At present, many colleges and universities in our country have applied teaching management systems in actual work, and according to the characteristics of massive data involved in teaching management, they provide functions such as data storage and data query, which significantly improve the efficiency and quality of teaching management. In the teaching process, the continuously accumulated historical data contain a large amount of valuable information, but the existing teaching management system generally lacks the functions of data analysis and exploration. If On the basis of the existing teaching management system, combination of efficient data mining technology to analyze and mine the massive information in the teaching management system will help university faculty and staff to accurately grasp the potential laws in teaching and make better strategies. The comprehensive and scientific support will ultimately promote the innovation of teaching work in colleges and universities and achieve personalized talent training goals [35].

Association rules were originally used to discover interesting relationships between items in the shopping basket and gradually became one of the important methods of data mining [6, 7]. The purpose of association rules is to find out the commodity relations hidden in big data, and the focus is to find out frequent itemsets in big data sets, that is, those commodity sets that frequently appear together. Agrawal and Srikant [8] proposed the Apriori algorithm, which mines frequent items through an iterative search layer by layer and can quickly and accurately mine association rules. Subsequently, a large number of Apriori improved algorithms appeared, but they all adopted a serial method. Only when the provided data set is very small will it have better results [9, 10]. With the advent of the big data era, the storage space required is getting larger and the retrieval speed is getting slower and slower. Various Apriori algorithms can no longer meet people’s needs for time and efficiency. In order to mine frequent sets more efficiently, parallel algorithms are introduced [11, 12]. The cluster-based association rule algorithm can improve the efficiency of processing transaction data sets, but the algorithm structure is complex, and there are problems such as synchronization and data replication. The study found that the frequent set of association rules mining based on MapReduce is more efficient. Verma and Singh [13] parallelized the Apriori algorithm by means of MapReduce. The algorithm uses HDFS to store data sets and finds frequent itemsets in massive data. Experiments show that this algorithm is significantly better than the previous Apriori traditional algorithm. Maleki et al. [14] pointed out that the task start up and disk I/O overhead of MapReduce during iterative calculation is too large, which reduces the execution efficiency. Sethi and Ramesh [15] proposed a distributed frequent itemset mining algorithm for big data analysis. Wang et al. [16] proposed the YAFIM algorithm to mine frequent sets of large data partitions in parallel. This algorithm stores the candidate set in a hash tree and is considered the better Apriori algorithm at present. Ramu et al. [17] proposed an association rule mining method based on cloud computing by improving the internal framework of the platform to meet the needs of association rule mining. Bai et al. [18] proposed an association rule algorithm based on the MapReduce computing framework. The algorithm uses Map and Reduce to solve the candidate itemset and reduce statistics. Xie et al. [19] proposed the research and optimization based on Apriori algorithm. The algorithm has made corresponding improvements to the number of scans and capacity reduction of the transaction database by introducing the idea of FIS-IS algorithm. Subsequently, many research institutes optimized the association rule algorithm. Truong et al. [20] proposed the EHUSM algorithm, which uses a pessimistic utility model to mine efficient sequences for optimization. Lin et al. [21] used the super-large concept to update the high-average utility model. Lin et al. [22] explored the skyline pattern by considering frequent and practical constraints. Liu and Qu [23] mine efficient itemsets without generating candidates, which further improves the effectiveness of the association algorithm.

Aiming at the shortcomings of Apriori algorithm, this paper proposes an Apriori algorithm based on interest degree. First, aiming at the low efficiency of the Apriori algorithm for mining association rules, an association rule model based on interest is proposed. Second, use the MapReduce calculation model to partition the transaction database, then use the improved Apriori optimization algorithm for mining, and finally merge the mining results to obtain frequent itemsets.

2. Association Rule Mining Technology

2.1. Data Mining Technology

The development and wide application of computer network and database technology have made the important role of information in the development of enterprises more and more recognized by people. A huge number of databases are used in business management, government offices, scientific research, and engineering development. This momentum will continue to develop. Hidden behind these data are extremely important business knowledge, but this business knowledge is implicit and unknown in advance. In this context, data mining technology emerged [2426].

Data mining is the process of identifying effective, novel, potentially useful and finally understandable patterns from a large number of, incomplete, noisy, fuzzy, and random data sets. Pattern can be regarded as what we call knowledge, which gives the characteristics of data or the relationship between data and is a more abstract description of the information contained in the data. KDD means to obtain knowledge from a database. It represents the whole process of extracting high-level knowledge from low-level data. It is mainly popular in artificial intelligence and machine learning circles. Data mining refers to the automatic extraction of models from data. It is mainly used in the statistical community to appear in the statistical literature, data analysis, database and management information system community. In the general definition, data mining is regarded as a knowledge discovery process; a core part of using data mining technology can obtain a variety of knowledge needed for decision-making from massive data. In many cases, users do not know what valuable information knowledge exists in the data. Therefore, for a data mining system, it should be able to search and discover the knowledge of multiple patterns at the same time to meet user expectations and actual needs. In addition, the data mining system should be able to mine multiple levels of pattern knowledge. The data mining system should also allow users to guide the mining of valuable pattern knowledge.

With the rapid development of information technology, business process management is playing an increasingly important role. The business process model is the key to business process management, and it is important to dig out the current business process model. With the continuous update of event logs, traditional mining algorithms focus on remining from scratch, which increases the cost and low efficiency. At present, incremental mining is widely used in data mining. Incremental data mining mainly refers to only mining incremental databases and updating existing mining results. Get the latest rules. Literature [27] proposes a new incremental semisupervised learning framework for traffic data. Each layer model includes a generation network, a discriminative structure, and a bridge. The generation network uses dynamic feature learning based on auto encoders to learn the generation characteristics of streaming data. Literature [28] proposed an effective algorithm for mining high-efficiency patterns from incremental databases, in which a database scans a list-based data structure without candidate generation. This algorithm is superior to the previous one-stage construction method and candidate generation. Literature [29] proposed IncGM+, a fast incremental method for mining continuous frequent subgraphs on a single large evolutionary graph. The concept of “edge” is adapted to the graphic context, and IncGM+ maintains edge subgraphs. This method uses them to trim the search space. Literature [30] proposed an incremental algorithm called IMU2P-Miner, which is used to mine the incremental maximum frequent pattern from univariate uncertain data. Therefore, this paper proposes an increase and optimize the process model by mining volume, and based on the process model that has been mined, mining incremental log information to update the model.

2.2. Apriori Algorithm

This algorithm [31, 32] is the earliest mining algorithm in association rules and a classic rule-mining algorithm, which has a profound impact on the mining of association rules.

Among them, the key to the algorithm is the first one. Compared with the second one, it is relatively complicated. The second one is intuitive and simple. The second one is to obtain all frequent itemsets based on the first mining. Among them, getting frequent itemsets is the most computationally expensive part of the algorithm, which has a great impact on the mining efficiency of the algorithm.

The Apriori association analysis algorithm mainly uses frequent itemsets for association analysis and uses an iterative method to search out all the frequent itemsets that meet the conditions layer by layer to realize the analysis of the nature of the ambiguous relationship. The usual procedure is to find the set of frequency 1-itemsets first and mark the 1-itemset as L1, which is used to search for the set L2 of frequent 2-itemsets. We find the L3 set through the L2 set, iterate all the time, and finally frequent itemset is empty and the iteration is completed. In order to illustrate the mining process of the Apriori algorithm, the following transaction database is shown in Table 1.

First, convert the original scores according to certain standards, that is, convert each score to a Boolean type. The principle is that the average score of the subject is higher than 1, and the minimum score is 0 to obtain a new database. The converted data are shown in Table 2.

In the upper database, the Apriori algorithm is used to analyse the data association rules. As shown in Figure 1, the mining process is to scan the database to obtain the largest frequent itemset, obtain the candidate 1-itemset C1, and then calculate the support of each itemset. The candidate 1-item set C1 is pruned, and the items with support less than 4 are removed to obtain 1-frequent itemset L1. The Apriori algorithm is used to complete the selfconnection operation of each item, and the candidate 2-itemset C2 is obtained after scanning, which is the support degree of the itemset. The candidate 2-itemset C2 is pruned to remove the data items with support less than 4, and the 2-frequent itemset L2 is obtained. The Apriori algorithm is used to complete each selfconnection operation, and after scanning, the candidate obtains the support of each candidate 3-itemset. The candidate 3-itemset C3 is pruned to remove the data items with support less than 4, and the 3-frequent itemset L3 is obtained. Finally, the frequent itemset L3 {I1, I3, and I5} is {C language, linear algebra, and IT project management}.

3. Teaching Management System Based on Improved Apriori Algorithm

3.1. Teaching Management System

At present, in order to improve the level of teaching management, many domestic universities have adopted related management systems and database software. The most commonly used ones are teaching management systems and OA information management systems. However, long-term system operation has accumulated a large amount of data. At the same time, teaching administrators retain simple statistics or analysis functions to obtain information on the surface of the data because they lack data processing technology and colleges and universities do not pay enough attention to this aspect, thus hiding this potential information, resulting in a large number of incorrect used data.

The teaching management system is mainly composed of server and client. The core function of the server is to realize the logic operation of the application. The core role of the client is to provide users with functions such as interaction and display. In this system, the server can support common C/S and B/S architecture modes, and the communication protocol is compatible with SOCK-ET and HTML. The client is mainly aimed at the currently widely used smart phone mobile terminals, using the C/S architecture model. According to the above design objectives, the teaching management system designed in this article includes application servers, database servers, network and communication servers, and terminals. The specific architecture is shown in Figure 2.

According to the basic network platform, operating system and selected data of the teaching management system, taking into account the security of the system and referring to relevant protocol standards, the network topology of the teaching management system is shown in Figure 3.

It can be seen from Figure 3 that the teaching management system first runs in the application server, and a large amount of teaching data that needs to be inserted and backed up is stored separately in the data server, to make certain preparations for future data restoration. At the same time, the application server can connect to clients, enterprise objects, and servers. Enterprise objects are connected through the WEB network, and the teaching management system functions can be updated or upgraded accordingly. The client object is generally the teaching management staff or related teachers. By installing the client on the local computer, entering the account number and password, you can enter or modify the relevant score data in the teaching management system. The WEB server is mainly aimed at a large number of student users. The application server is paralyzed due to student’s course selection and other related conditions. It is necessary to establish multiple WEB servers and set up multiple teaching management system website addresses for students to query related information or other operations. The browser customers are superficial, which is equivalent to going directly to the computer desktop, inputting the website address of the teaching management system through the network, and using their respective account numbers and passwords to query useful information of teachers and students.

3.2. Improved Apriori Algorithm

The traditional Apriori algorithm has a large number of frequent candidate sets, which leads to a long scanning time, which reduces the efficiency and accuracy of mining. In this regard, in order to improve the accuracy of student course recommendations, this article has made two improvements as follows:(1)Based on the traditional Apriori algorithm, a weighted association rule is introduced. Suppose that, in itemset A, there is a weight Bi for any item Ai. Through this weight, the importance of the item in the entire collection can be measured. The larger the weight is, the more prominent its importance is. The items in the collection are sorted according to the size of the weight, to obtain a combination from large to small, and finally form a linear order set. Use a and b to represent the elements in itemset A, a < b and the weight .We set the weighted support of element as , where is the support in the association rules.If , then is the weighted frequent itemset, where represents the minimum weighted support of the user.(2)In terms of confidence, if , , then the confidence of (a, b) is defined as follows:where represents the number of element and element appear in the record at the same time.In the above association rules, the weight of the course is set based on the importance of the course by the relevant experts. However, simply considering the weighted support and weighted confidence, the number of association rules obtained is very large, which leads to the lack of practical guidance for the recommended courses. In order to improve the accuracy of recommendation, interest is introduced.Suppose that represents the probability of occurrence of a, and represents the probability that events element and element occur at the same time. If , then element and element are related. If , then element and element are independent of each other. Therefore, in the association rules, the interest degree of (a, b) can be expressed as follows:When Int > 1, it means element and element are positively correlated, that is, the appearance of a can drive the appearance of b. When Int ≤ 1, it means that element and element are negatively correlated, that is, the appearance of element hinders the appearance of element . Combining the above two methods, define (a, b) and satisfy ; then it is considered that (a, b) meets the association rule of minimum interest.

3.3. Data Block

MapReduce is a computing model, framework, and platform for parallel processing of big data. It implies the following three meanings:(1)MapReduce is a cluster-based high-performance parallel computing platform. It allows common commercial servers in the market to form a distributed and parallel computing cluster containing tens, hundreds to thousands of nodes.(2)MapReduce is a parallel computing and running software framework. It provides a large but well-designed parallel computing software framework that can automatically complete the parallel processing of computing tasks and automatically divide computing data and computing tasks. Automatically allocate and execute tasks and collect calculation results on cluster nodes. Many of the complex details at the bottom of the system involved in parallel computing such as data distributed storage, data communication, and fault-tolerant processing are handed over to the system for processing, which greatly reduces the burden on software developers.(3)MapReduce is a parallel programming model and method. It provides a simple parallel programming method with the help of the design idea of the functional programming language Lisp. MapReduce uses Map and Reduce two-function programming to achieve basic parallel computing tasks, provides abstract operations and parallel programming interfaces, and completes large-scale data programming and computing processing simply and conveniently.

MapReduce provides the following main functions:(1)Data division and computing task scheduling(2)Data/code mutual positioning(3)System optimization(4)Error detection and recovery

Since the algorithm proposed in this paper is to use the MapReduce calculation model to realize the above-mentioned association rule mining, after understanding the association rule-mining algorithm, the next work to be done is to block the data of the transaction database. In the process of partitioning the transaction database, the size of the data block has a great impact on the execution efficiency of the algorithm. If each block of data is too small, it will increase the data transmission time, and if it is too large, it will increase the computational pressure of the node. Therefore, this paper adopts the method of parallel segmentation, parallel computing to obtain the frequent itemsets of each node, and then the frequent itemsets of each node are counted to obtain the global frequent itemsets. The specific data mining process of the teaching management system is shown in Figure 4.

Step 1. Split the transaction database horizontally, send the divided data blocks to m nodes in the cluster, and start the work on the Map side.

Step 2. After the data are sent to m different nodes, they are sequentially converted into the corresponding minimum degree of interest. Then, scan to obtain frequent one itemsets and generate local frequent matrix at the same time.

Step 3. Compress the rows and columns of the matrix, respectively.

Step 4. Convert the compressed matrix obtained on each node into a local frequent itemset.

Step 5. Combine the key values with the same key, then count the partial support, and compare it with the set minimum support. If it is not less than, then it is added to the frequent k itemset. Ultimately, constitute a union.

Step 6. According to the comparison of the degree of confidence, the association rules that finally meet the requirements are obtained by circular calculation.
At the same time, in order to better understand the algorithm proposed in this article, we give the pseudocode of the algorithm, as shown in Table 3.

4. Results and Discussion

4.1. Analysis of Time Results under Different Transaction Numbers

This experiment was carried out by using seven virtual machines to realize the construction of the platform cluster, of which one machine was used as the master node in the cluster, and the remaining six machines were used as the slave nodes. In the experiment, the first group used transaction databases with different numbers and sizes, and the second group used transaction databases of the same size but compared and analysed different support degrees. Figures 5 and 6, respectively, show the experimental comparative analysis results of the two sets of different data.

It can be seen from Figure 5 that the improved Apriori algorithm is much shorter than the traditional Apriori algorithm in running time. In addition, as the number of transactions increases, although the running time of the improved Apriori algorithm is also increasing, the gap between the two is gradually increasing. When the amount of data is larger, the advantage of the improved Apriori algorithm is more obvious. At the same time, it can be seen in the figure that when the amount of data reaches 1 million, the traditional Apriori algorithm is already very high, which is hundreds of times that of the improved Apriori algorithm.

It can be seen from Figure 6 that the running time of these two algorithms decreases with the increase in support, but the time taken by the improved Apriori algorithm is much shorter than that of the traditional Apriori algorithm. The smaller the support degree, the greater the difference in running time between the two algorithms. It can be seen that the improved Apriori algorithm is more efficient when mining association rules with less support.

4.2. Analysis of the Results of the Mining Process

This article conducts experiments on the relevant processes and methods of data mining application for test paper evaluation. The selected test papers consist of fill-in-the-blank questions, multiple-choice questions, true or false questions, calculation questions, application questions, comprehensive one, and comprehensive two. The total score for each big question is 15 points, 20 points, 10 points, 20 points, 15 points, 10 points, and 10 points. In order to better analyse the association rules, the data need to be generalized, and the processing method is as follows: use A, B, C, D, E, F, and G to represent seven major questions and each 5 is divided into a file. There are four gears with a total score of 20, three gears with a total score of 15, and four gears with a total score of 10. We can get the itemset as {A1, A2, A3, B1, B2, B3, B4, C1, C2, D1, D2, D3, D4, E1, E2, E3, F1, F2, G1, and G2}. Set the support to 0.03 and the confidence to 0.6. Through the above processing, the frequent itemsets mined by the association rule algorithm are analysed, the patterns and rules that teachers are interested in are found, and useful association rules between the scores of each question used to measure the rationality of the test paper are obtained, as shown in Figure 7.

It can be seen from Figure 7 that the overall test paper content is relatively reasonable and conforms to the learning rules, but multiple-choice, calculation, and comprehensive questions have a strong correlation in knowledge points, and there may be duplication among knowledge points involved. Students who do not master the same knowledge point may not be able to score in multiple questions. From the perspective of students, these repetitive knowledge points may be difficult to understand, and more attention should be paid to the explanation of these knowledge points in teaching. From the perspective of the classroom, you should pay attention to avoid duplication of knowledge points as much as possible when writing papers and more should avoid the phenomenon that the latter questions rely on the former questions. Otherwise, it will not only affect the student’s performance but also reduce the quality of the test, making it impossible to achieve a comprehensive evaluation. This conclusion has certain guiding significance for schools and teachers.

4.3. Analysis of the Results of Elective Courses

In the experimental test of the association rules of student’s course selection, the transaction set is a total collection of 530 student’s course selection data, among which there are 11 courses in the course item collection. Through the mining of the association rule algorithm of the elective course, the association rules, as well as the frequency, support, and confidence of each association rule are obtained.

Figure 8 shows the mining results. It can be seen from the experimental results that the support degree of the obtained association rules is greater than 0.2, the confidence degree is greater than 0.7, and the promotion rate is greater than one. In the association rules, two, five, six, and eight represent database principles and applications, Java object-oriented, Android course learning, and graphics principles and applications, respectively. Strong association rules are generated among multiple courses, and multiple students simultaneously select their courses. When optimizing course scheduling in colleges and universities, priority should be given to the eight and two courses with high-lift value and then sort them according to the lift value to avoid the time and space conflicts between these courses.

According to the results of the above experiments and the values achieved through the test indicators, it can be seen that the improved Apriori algorithm can play a good role in optimizing the schedule.

The improved Apriori algorithm is compared with the traditional Apriori algorithm and the YAFIM algorithm. Figure 9 shows the running time of the traditional Apriori algorithm, the YAFIM algorithm, and the improved Apriori algorithm under different minimum support degrees. The performance of the three algorithms under different minimum support degrees is compared under different minimum support degrees. It can be seen from the figure that when the minimum support is low, the time costs of the three algorithms are quite different. Although the YAFIM algorithm is better than the traditional Apriori algorithm, the improved Apriori algorithm is also significantly better than the YAFIM algorithm. With the increase of the minimum support degree, the number of iterations of scanning the database is reduced, and the number of candidate itemsets and frequent itemsets generated is greatly reduced, and the time overhead of the three algorithms is relatively low. However, the running time of the improved Apriori algorithm is also shorter than that of the traditional Apriori algorithm and the YAFIM algorithm.

Figure 10 shows the comparison of the running time of the traditional Apriori algorithm, the YAFIM algorithm, and the improved Apriori algorithm under the same minimum support and different minimum confidence conditions. Compare the performance of the three algorithms with fixed minimum support and different minimum confidence. It can be seen from Figure 10 that, under different confidence levels, the operation time of the three algorithms does not fluctuate much. This is mainly due to the calculation of the confidence level, which is mainly based on the mathematical operation of the support. When seeking support before, the count of each set has been stored, and the computational overhead is relatively low. It can be seen from Figure 9 that, although the YAFIM algorithm is superior to the traditional Apriori algorithm in terms of the operation time, the improved Apriori algorithm proposed in this paper also has a significantly better running time than the above two algorithms.

By comparing the running time of the three algorithms with different degrees of support and comparing the running time of the three algorithms with different confidence levels, it can be seen that the improved Apriori algorithm has obvious mining efficiency when mining association rules on a large number of student selection data. It is higher than the traditional Apriori algorithm and YAFIM algorithm. This also shows that the improved Apriori algorithm has a very good effect in the analysis of association rules among student’s elective courses.

5. Conclusion

Data mining is the process of extracting useful information and knowledge from massive amounts of data. It is the process of automatically extracting potentially useful and understandable nontrivial processes hidden in the data from the data set. By summarizing the query content and searching for internal laws, it helps decision makers analyse historical data and current data and discover hidden relationships and patterns from them, thereby predicting possible future behaviours and providing strong support for decision-making behaviour. The application of data mining technology to the teaching management system can obtain potential and valuable decision-making judgments from the massive teaching information and guide the subsequent teaching and teaching management work more effectively. Therefore, aiming at the shortcomings of Apriori algorithm, this paper proposes an Apriori algorithm based on interest degree. First, aiming at the low efficiency of the Apriori algorithm for mining association rules, an association rule model based on interest is proposed. Second, use the MapReduce calculation model to partition the transaction database, then use the improved Apriori optimization algorithm for mining, and finally merge the mining results to obtain frequent itemsets. The experimental results show that the improved algorithm not only achieves better data mining performance but also greatly improves the computing time.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest regarding the publication of this paper.