Abstract

Under digital technology, the vigorous development of online education has also encountered challenges of different degrees, such as the high dropout rate of learners, the low completion rate of courses, and the loss of users. Learning engagement has not yet formed an effective assessment system. Based on an exploration of the core of learning activity engagement, this research evaluates the state of learning activity engagement utilizing learners’ adaptive adjustment processes of information exchange activities and a random trees model. A combined classifier is a random tree. Random trees are a combined classifier. Its main idea is to build multiple relatively independent decision tree classifiers based on two random processes, and then obtain the final prediction results by voting all decision trees. The traditional random trees model is improved by weighted calculation and aggregation calculation. After experimental analysis, it can be found that the highest can reach more than 80%, which proves that the improvement of the weighted value has a good reflection on the random trees model, and the accuracy rate is increased by 65.2% after the weighted improvement. Overall, the performance of the improved random trees model is improved by 67.3%.

1. Introduction

Based on the gradual transformation of IT and the continuous application of mega data processing technology in the field of education, more and more online learning platforms have been applied in practical teaching [1]. The online instructional platform has a variety of instructional resources, and a variety of learning support service tools and supports the development of a variety of instructional modes, which can meet the various teaching and learning needs of teaching stakeholders [2]. Online learning provides on-the-job personnel with the opportunity to further study and improve the quality of the labor force. It is of great significance to solve the current situation of a large labor force but low ability quality in China [3]. Compared with the traditional instructional form, online learning is a new learning mode developed under the network environment. With the help of the abundant and shared instructional resources on the network, it breaks through the limitation of time and place of traditional teaching, and provides a virtual and interactive learning environment for teachers and students, students and students [4]. With the continuous innovation of IT in the field of education, traditional education methods can no longer meet the learning needs of learners [5]. In recent years, the gradual transformation of educational informatization has changed people’s learning methods, learning thinking, and learning cognition. Online learning has been widely recognized by all sectors of society and has become an indispensable learning method in our study and life.

Throughout the current Internet + teaching practice, an effective assessment system has not been formed, especially the integration of assessment indicators and front-line teaching scenarios is not enough [6, 7]. Therefore, there is an urgent need to design a new assessment method for the teaching scene of IT, and adopt an index system with both reliability and validity to consider the learning state [8]. These learning activities are often unconscious and scattered, and the underlying rules and underlying structures between behaviors are not significant, but they can better reflect the subtle and complex logical relationship in learning than explicit learning activities and can reflect the truest thinking and learning situation of students, which is difficult for teachers or other assessment mechanisms to capture [9]. Select the characteristic dimension of learning activity analysis, and build a corresponding index system for learning activity characteristics and behavior indicators through online learning activity performance and the implementation process of key behavior analysis technologies, to discover and measure learners’ engagement, motivation, and interaction [10]. Therefore, it is necessary to supervise the learning process of learners, predict the learning development trend of learners in time, take appropriate intervention measures for different learning effects, and give learners targeted help, guidance, or encouragement so that learners can constantly correct the learning route in the learning process.

The random trees algorithm is an excellent classification algorithm, which belongs to a typical combination classifier algorithm. It was first put forward by Professor Leo Breiman, an academician of the American Academy of Sciences in 2001. The basic classifier model of the random trees model is the classification regression tree model. In order to make the decision tree model in the random trees model different, two “random” processes are adopted in the process of building the random trees model [11] learning activity analysis is more and more widely used in education [12]. The traditional education data is single and one-sided, and cannot objectively present all the learning activities of learners. The online instructional platform comprehensively records the learning results data, learning methods, platform usage characteristics, and other learning activities of learners. The comprehensiveness of the online instructional platform data It is extremely important to analyze the learning activity of learners [13]. From the perspective of learner development and instructional mode reform, the research on learning activity analysis can guide teachers to provide personalized guidance to learners, bring new ideas to educational researchers to optimize instructional mode, and also bring new enlightenment to the construction of personalized learning and adaptive system [14]. However, there are still some shortcomings in the above research, so this study puts forward some innovations on this basis:In view of the shortcomings of the random trees algorithm, For example, there is no very reasonable method to specify the size of random trees when the decision tree is generated. Too large or too small a model will affect the final decision result of the model. This study proposes a random trees model composed of weighted decision trees. According to the different generalization abilities of a single decision tree, the decision weight of the decision tree in the process is calculated to improve the overall prediction accuracy of the model.Model optimization of random trees. By analyzing a single classifier in random trees, their similarity is calculated. Through the clustering method, the classifiers with large differences are extracted and integrated to make more fair and effective decisions. The results of various measurement methods are analyzed through experiments, and some further improvement schemes are put forward.

Ferreira et al. proposed the concept identification of weak learning algorithms and strong learning algorithms, and their equivalence problem, that is, whether it is possible to upgrade weak learning algorithms to strong learning algorithms [15]. Wang et al. believe that parallel development on the spark platform has several very important benefits: first, the spark has a unified API, so it is very easy to develop applications; Second, the spark can perform different operations on the same data, and these different operations can be combined to obtain higher efficiency; Third, spark usually operates in memory and has high efficiency [16]. Patil and Deore put forward a circular data analysis model, which includes seven parts: data collection, storage, data cleaning, integration, data analysis, data presentation and visualization, and corresponding behaviors. He summarized the sources of data into two platforms, namely a learning system and a learning management system [17]. Tafesse and Wien used the clustering method in the data mining method to perform cluster analysis on the learner behavior data in 612 courses to extract the learner behavior characteristics [18]. The research of Min and Kim shows that the online learning assessment system is the key and core of online learning, so it is urgent to build a reasonable and perfect online learning assessment system and make a fair assessment through the collection and analysis of learners’ learning activity [19]. Rajabalee et al. put forward the theory of “granular learning activity”, which is used to discover the granular learning activity pattern of learners’ learning interaction with an intelligent tutoring system, manage the uncertainty of learning activity, and cluster the N-gram model into a hierarchical structure by using rough set-based map particles, which can be used to predict learners’ learning activity in intelligent tutoring system [20]. Koehler and Meech believe that the current domestic research on the relationship between teacher support, learning burnout, and learning motivation is relatively in-depth. The findings indicate that teacher support is adversely connected to students’ levels of learning fatigue, and that professional assistance from instructors can increase students’ learning in an online learning environment. The level of interaction and perspective is turned to research on teacher support strategies, but the majority of them are aimed at early childhood and young children, and there is relatively little research on the relationship between teacher support and learning activity engagement in the online learning environment [21]. This research results show that the overall goal of data mining is to find hidden useful information from complex data sets and use this useful information to create greater value. The task of data mining is to automatically or semi-automatically extract previously unknown interesting patterns from a large amount of data, such as data records, exception records, and dependencies [22]. Willans et al. proposed a technology similar to Boosting, Bagging. Breiman emphasizes that the stability of the learning algorithm in integration has a great influence on the final result. For unstable algorithms, such as neural networks and decision trees, it can improve the accuracy of prediction. But it has no obvious effect on the stable learning algorithm, and sometimes even reduces the prediction accuracy [23]. Cahn and Anna believe that random trees are a nonlinear combination classifier. It uses random sampling with replacement to extract training samples and feature sets. After training, many weak classifiers are obtained, and a large number of weak classifiers (decision trees) are combined into random The forest classifier, obtains the final prediction result by means of [24]. Hooshyar and Yang sorted out three indicators affecting academic achievement: academic factors, demographic factors, and cultural and social factors. Therefore, we summarized the prediction indicators adopted from the relevant research on actual data prediction [25]. Yang believes that the core and connotation include three aspects: the object of learning analysis refers to the data generated by teaching stakeholders in the process of teaching and learning, and the focus of learning analytics is to analyze the data by using correlation analysis methods. The goal of learning analytics is to discover learning rules, predict learning effects, evaluate the learning process and optimize teaching effects [26]. Green et al. based on the rough set theory put forward an algorithm for attribute reduction using discriminative function. Knowledge of key personality characteristics, revealing the objective relationship between personality characteristics and learning strategies, and reducing the amount of data [27].

Based on the research of the above-related work, this study determines the positive role of the random trees model in the field of learner behavior investment assessment, constructs a random trees model optimized by algorithm, makes an in-depth analysis and research on the acquired and collected data using big data algorithm analysis, makes more effective use of data, and excavates the valuable knowledge hidden behind the data, Find and find out the potential problems in the assessment of learners’ behavior investment that affect online learning.

3. Methodology

3.1. Construction of Random Forest Model

Learning refers to the process of a series of activities produced by the learning subject by obtaining a certain learning effect in the process of life learning activity comes with learning. In traditional learning, learning activity refers to the behavior that learners interact with learners, teachers, and teaching resources in the classroom environment. Such learning activities can only be recorded through teacher observation, video storage, and questionnaire. Behavioral science is a science that cooperates in the fields of psychology, sociology, and economics. It mainly studies human behavior or human aggregate behavior by using the experimental and observation methods of natural science. Based on the mainstream psychological point of view, learning activity can be divided into two kinds: explicit learning activity and implicit learning activity. Explicit behaviors are learning activities that can be directly observed, such as reading, taking notes, etc., while implicit learning activities cannot be directly observed behaviors, such as thinking, awareness, analytical ability, etc. Different educators have different emphases on understanding learning activities. Educational assessment has specific guiding and management functions.

Educational assessment is not only a guide for teaching management, teachers’ teaching content, methods and learners’ learning, but also restricts and promotes learning. Management function refers to the constraint effect and ability of learners to achieve the ultimate goal by regulating, controlling and standardizing learners’ behavior. Learning activities refer to the sum of a series of behaviors that learners perform according to their individual learning needs when logging on to the online platform. Therefore, the learning activities may include logging into the learning platform, accessing a certain resource or module, replying to others, etc. Each learning activity forms a learning activity flow. Considering each learning activity as a system, it includes activity subject, activity object, activity operation, activity environment and activity result. The duration of each learning activity and the time interval for entering the next learning activity will vary according to different personal habits and motives. The analysis of a single online learning activity is shown in Figure 1.

At this time, for the convenience of statistics, the duration of a single learning activity is recorded as , and the start time and end time are recorded as respectively; the time interval between two consecutive learning activities is recorded as , the end time of the previous activity and the start time of the next one They are recorded as respectively. Therefore, the duration of a single learning activity , the time interval of two consecutive learning activities , the average learning duration , and the average interval of learning activities are as follows:

From the research on the composition of online learning activity system and the existing online learning activity assessment index system, it can be seen that although the establishment of the existing online learning activity assessment index system has been deeply studied in theory, these theories can not better serve the specific online learning activity analysis and assessment objectives. The following Table 1 shows the relevant indicators of learning activity level.

Therefore, based on the existing online learning activity model, combined with specific online learning activity analysis and assessment indicators, this study formulates online learning activity models for different analysis and assessment objectives, as shown in Figure 2.

The whole online learning activity model includes four layers. The learning activity description is used to clearly describe various specific online learning activities. The third layer is the online learning activity classification system. Different from the previous hierarchical classification models, the wood model is a parallel classification model that analyzes and evaluates the learning activity. Each type of learning activity includes both lower-level online learning activities, It also includes a high-level online learning activity model. Each type of online learning activity serves a specific online learning activity analysis and assessment goal. The top level is the goal of online learning activity analysis and assessment. In terms of teaching, the viewpoint of the constructivism model is that learning is not a process of passively accepting knowledge, but that learners selectively construct new knowledge or things acquired into their own knowledge system according to their own cognitive structure. Teachers’ duty is no longer simply to impart knowledge to learners, but to guide them to acquire learning experiences. Therefore, constructive learning theory has shifted from an objectivist knowledge dissemination model to an active learner model.

Each behavior activity is a systematic process in which the behavior subject interacts with the behavior object, operates, and produces some results in a certain learning place and time [28]. Similar to the online learning activity description model, the online learning activity classification should also be guided by the analysis and assessment goal. Although the hierarchical classification system based on the level of information processing can better reflect the changes in learners’ cognitive level when implementing learning activity, it can not directly serve the analysis and assessment.

3.2. Application of Stochastic Senli Model in Assessment

The random trees model is used for data prediction. All decision trees make predictions and determine the final prediction result of the model along the way. However, there are still some problems in practical application, such as the classification of small data sets and low dimensional data sets may not get good results. Because in the process of repeated random selection, there are few examples to choose from, which will produce a large number of repeated choices, which may make the most effective choice unable to show advantages. Figure 3 shows the basic construction of the random trees model.

Random trees are a combined classifiers composed of multiple meta-classifiers, each of which is relatively independent. The process of model result prediction is that each meta-classifier predicts the result separately and gives the statistical results of all decision trees. Process, for the combined classifier , there is an input vector , and the model workflow is shown in Figure 4.

When the input data is, each meta-classifier predicts the results relatively independently. After obtaining the prediction results of all meta classifiers, the random trees model obtains the overall prediction results of the model through predetermined rules. Generally, for the classification problem, the prediction result given by the meta-classifier is a specified classification result, and the random trees make statistics on all the classification results, giving the one with the largest number as the final prediction result; For the regression problem, random trees is generally the final prediction result by calculating the average of the prediction results given by all meta-classifications.

3.3. Partial Optimization of Algorithm Using Stochastic Forest Model

Since in the process of building the random trees model, the construction of the decision tree and the OOB estimation of the decision tree can be completed serially. While the construction of the random trees model is completed, each decision tree obtains a corresponding OOB assessment value, so as to allocate the weight of the corresponding decision tree. For the decision tree , its weighted value is defined as poob, which is expressed as:

Among them, is the data assessment of the decision tree using OOB data, and the correct number of samples for assessment, is the total number of samples participating in the OOB assessment of the decision tree, and is the parameter adjustment coefficient. Since the random trees model obtains the OOB assessment accuracy of each decision tree in the actual calculation, the assessment result of the final model can be expressed as:where is the set of all category labels, is the category label in the set , and represents the explicit function. When the assessment result of the decision tree is category label , the explicit function is equal to 1, otherwise it is equal to 0. is the weighted value of the th decision tree in the process, is the weighted result obtained by the th class label, and the final assessment result of the model is the one with the largest total weighted value obtained by each class label.

Since there are many variables, when measuring the correlation between variables, it is necessary to quote the correlation coefficient as the basis for judging the linear correlation between two variables. The correlation coefficient is one of the important indicators to judge the correlation degree between variables. Let variables count data sets respectively, then the correlation coefficient between variables can be expressed as

In the above formula, is the average value of the statistical data of the variable , respectively. It should be noted that the value of the correlation coefficient is distributed in the range of - 1 to +1, i.e. . When , it means that the two variables are positively correlated. On the contrary, when , it means that the two variables show a negative correlation. When the variable correlation coefficient is , it means that the two variables show absolute correlation. When , it means that there is no correlation between the two variables. As k increases, the results of the correlation coefficient r will be more accurate.

Generally, the relationship between two decision trees is analyzed, and the classification result of is represented by , which is generally expressed as

The relationship between the results of two decision trees classification samples can be expressed aswhere represents the number of samples that can correctly classify in the whole data set as follows:

Clustering is dividing the set into several subsets according to the similarity between samples, and the samples in each subset have a high degree of internal similarity. Our purpose is to reveal the relationship between classification trees through clustering, so as to select more targeted classification tree individuals more effectively. As N increases, the results of the number of samples that can be correctly classified in the whole data set a will be more accurate.

4. Results, Analysis, and Discussion

Based on the above research, this study proposes the application of learner behavioral engagement assessment based on the random trees model, in order to verify that the random trees model has an actual effect on the assessment of learner behavioral engagement. Therefore, in order to verify the scientificity, accuracy and feasibility of the improved random trees model, this study will make an experimental analysis from five important parameter dimensions: the assessment accuracy of different decision trees, the assessment accuracy of different characteristic decision trees, the comparison of running time on different data sets, the performance improvement of the improved random trees model and the correlation strength, and analyze the applicability of the improved random trees model to the assessment of learners’ behavior engagement. Three data sample sets a, B, and C are set as experimental samples for the assessment accuracy of different decision trees, the assessment accuracy of decision trees with different characteristics, and the comparison of running time on different data sets. The data diagram of the analysis method is shown in Figures 5, 6, and 7 below.

The above figure clearly reflects the overall trend of the assessment results on different decision trees and decision trees with different characteristics. You can basically figure it out. As a result, by focusing on the feature vector in the assessment of learner behavioral input, you can grasp the subject. The maximum in the figure above can reach more than 80%, demonstrating that the improvement of the weighted value is a good embodiment of the random trees model, and the accuracy is enhanced by 65.2 percent after the weighted improvement. To differentiate the differences, the volatility of three separate melodic data sets is within a suitable range because the three data sets largely follow the same trend in the range of 1–3. However, after 3, samples a, B, and C show significant changes, which is due to the different number of interference items in different sets. For the relationship between the number of interference items, C > a > b, which is also directly reflected in the fluctuation range in the range of 3–4. However, with the increase in sample size, the three tend to the same trend under the action of aggregation. It can be seen that the embedding of the aggregation algorithm is of great help to the calculation improvement of the random trees model. The random number sample sets Q1 and Q2 are set below as experimental samples for the improvement of the performance of the improved random trees model and the strength of the correlation. The specific experimental analysis diagrams are shown in Figures 8 and 9.

It can be seen from the above figure that the performance of the improved random trees model is significantly improved, which is well reflected in the overall assessment. In the range of 1–5, the performance improvement is slow. This is due to the small initial samples and the insufficient number of decision trees, which cannot achieve the overall operation effect. After accumulating enough data, the performance will increase rapidly, which is well reflected in 5–7. Overall, the performance of the improved random trees model is improved by 67.3%. For the analysis of the correlation strength, this study mainly studies the correlation between the samples, because the stronger the correlation, the higher the correlation ratio between each other. For assessment, the correlation is well expressed. This means that the accuracy of the assessment is guaranteed. Therefore, this is extremely important. In the experiment, the calculation of the model is basically stable, and the proportion of correlation is basically maintained above 1.2, which also means that the model has good control over the correlation.

5. Conclusions

Based on the study of learning activity analysis and learning theory, this study puts forward the overall framework of online learning activity analysis, and on this basis, analyzes the elements of online learning activity analysis, studies the key technologies in the analysis, realizes the scene application of online learning activity analysis, and draws a conclusion. Online learning breaks the limitations of time and place but also separates learners and teachers. It is difficult for teachers to observe learners’ learning process and learning activity as intuitively as traditional teaching, so as to timely discover the problems encountered by learners difficult and give prompt guidance. Due to its excellent performance in reducing generalization error of classification systems and simplifying classifier design, the random trees algorithm has become a research hotspot of common concern to researchers in many fields and technicians in application fields. As a combination classifier, the random trees model gives the final prediction results by all decision trees in the data prediction stage. For the learner, the analysis results directly present the learners knowledge mastery level report and the learning state of the process, so as to realize the level diagnosis of the learner and determine whether he can enter the next stage of learning. Through the experimental analysis, it can be found that the maximum can reach more than 80%, which proves that the improvement of the weighted value is a good embodiment of the random trees model, and the accuracy is improved by 65.2%. Overall, the performance of the improved random trees model is improved by 67.3%.

Data Availability

The labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The author declares no competing interests.

Acknowledgments

This research was supported by Bureau of Education of Guangzhou Municipality (No. KP202272).