Abstract
With the development of internationalization, English learning becomes more and more important. In the process of language learning, writing has always played a very important role. A writer’s language proficiency can be improved by the amount of reading experience and knowledge, which is necessary to produce high-quality writing. In recent years, there have been many writing assistant recommendation systems supported by different technical means, which provide great help for college students’ writing. In order to solve the problem that traditional recommendation algorithms can not recommend accurately, this paper proposes a hybrid recommendation algorithm and applies it to the recommendation of English writing documents. The algorithm generates three-dimensional feature vectors by learning the characteristics of students like, dislike, and similar students. Three low-dimensional feature vectors are linearly combined to form the representation vector of college students. And the cosine similarity is used as the similarity index to recommend English writing literature related to similar college students to the target college students, so as to achieve the recommendation of English writing literature. Experimental results show that this recommendation algorithm is superior to the other four algorithms in mean absolute error (MAE) and time performance and has high recommendation quality.
1. Introduction
English teaching is to develop students’ international perspective and make them build up the willingness to learn independently and consciously [1]. Learning to write not only strengthens students’ thinking skills and improves their expressive abilities, but also measures the effectiveness of teachers’ teaching [2].
Since 2002, research on college English writing instruction has grown rapidly and still lacks a theoretical foundation in terms of the single research topic previously studied [3]. From 2000 to 2009, research has shown an overall upward trend. Specifically, it began in 2002, while research and development has declined since 2010. Over the past decade, Chinese researchers have explored and learned from their research experiences. Foreign scholars have shown a differentiated development in their research on university English language teaching by combining it with other areas of foreign language studies [4]. The research themes are now very rich and the theoretical backgrounds vary. Researchers have been more concerned with issues related to the teaching of college English writing, which provides a solid foundation for empirical research [5, 6]. The following are specific studies by different scholars at home and abroad on improving English writing (see Tables 1 and 2).
A recommendation system is an important tool to help users deal with information overload in the era of big data. It identifies a set of items of interest based on user behavior and recommends items of interest to users, saving them time.
Collaborative filtering (CF) algorithm is known as one of the successful techniques for personalized recommendation systems. The collaborative filtering technique recommends items for a target user by identifying content to be recommended by other users with similar interests. Collaborative filtering recommendation systems have been implemented in different application areas. News: GroupLens system uses collaborative filtering to help users find the right content for their needs from a large database of news. Social: Ringo is an online social information filtering system that uses collaborative filtering to build user profiles based on their ratings in music albums. Third, the e-commerce domain: Amazon uses topic diversification algorithms to improve its recommendations. The system uses collaborative filtering techniques to generate tables of similar items offline from an item-to-item matrix as a way to overcome scalability issues.
With the computerization of university management system, the recommendation system is also used for university teaching. English learners spend a lot of effort on English learning, but in terms of the results of learning, they do not achieve the expected results [17]. The biggest reason is because students lack a lot of reading experience and knowledge base [18]. The main way to improve college students’ English writing skills is through reading a lot of excellent English writing literature [19]. How to meet the needs of students with different English levels, provide accurate and personalized bibliography for each user among the huge amount of reference literature, and make real-time recommendations through an online recommendation system is the purpose of this research paper. Therefore, this paper proposes a hybrid recommendation algorithm based on multidimensional feature representation learning (MFL). The algorithm splits English writing literature scoring network. The algorithm based on the improved LINE performs hierarchical advancing learning of students’ favorite English writing literature and aversion to English writing literature. Based on the improved DeepWalk algorithm, a sequence of similar students is obtained and similar student features are captured. The preferred features, disliked features, and similar student features are linearly combined and connected as the final feature vector of students. The cosine similarity is used as the similarity metric to complete the English writing literature recommendation task.
The innovative points of this paper are as follows:(1)The English writing literature scoring network was split to provide a hierarchical advancement of students’ favorite and averse English writing literature based on an improved LINE algorithm.(2)Based on the improved DeepWalk algorithm, we obtain similar student sequences and capture similar student features.(3)After linear combination of the preferred features, aversive features, and similar student features, they are connected as the final feature vector of students, and the cosine similarity is used as the similarity metric to complete the English writing literature recommendation task.
This paper consists of four main parts: the first part is the introduction, the second part is methodology, the third part is result analysis and discussion, and the fourth part is the conclusion.
2. Methodology
2.1. Survey on the Current Situation of College English Teaching in China’s Universities
Based on the error classification method of [20] in CLEC, a total of 11 categories of errors were classified into language errors. They are word form errors-fm, lexical errors-wd, syntactic errors-sn, verb phrase errors-vp, noun phrase errors-np, collocation errors-cc, pronoun errors-pr, preposition errors-pp, adjective errors-adj, adverb errors-ad, and conjunctive errors-cj. After labeling the student essay samples, the writing errors of the college student essay samples in both grades were retrieved and counted separately. The amount of writing errors in the freshman composition was labeled as G1 and the amount of writing errors in the sophomore composition was recorded as G2, and the results are shown in Table 3.
The data in Table 3 are based on the number of writing errors in English compositions of freshmen and sophomores. In order to clarify the distribution of writing errors of college students more clearly and intuitively, a pie chart is used to analyze it (see Figures 1 and 2).


From the above table and graphs, it can be seen that the most frequent errors in students’ writing are vocabulary errors (wd). The results are 505 vocabulary errors for freshmen and 520 vocabulary errors for sophomores. The total number of errors in vocabulary (wd) for both grades was 1025, accounting for 30.15% of all errors in both grades. Students made the second highest number of errors in word form (fm), with 375 errors for freshmen and 406 errors for sophomores. The total number of errors in word form (fm) for students in both grades was 781, accounting for 23.27% of all errors in both grades. Syntactic (sn) errors were also very high for both grades, with 342 errors for freshmen and 199 errors for sophomores. The total number of syntactic (sn) errors for both grades was 15.91% of all errors for both grades, which was the third highest.
The errors of verb phrases (vp), noun phrases (np), collocations (cc), and pronouns (pr) in the two grades ranked fourth, fifth, sixth, and seventh. Among them, the number of verb phrase (vp) errors of freshmen is 213. The number of verb phrase errors of sophomores is 230. There are 443 errors in verb phrases (vp) of students in the two grades, accounting for 13.03% of all errors in the two grades. The number of NP errors of freshmen is 100. The number of noun phrase errors of sophomores is 102. There are 202 errors in noun phrases (np) made by students in the two grades, accounting for 5.94% of all errors in the two grades. The number of errors in collocation (cc) of freshmen is 64. The number of collocation errors of sophomores is 108. There are 172 errors in collocation (cc) in the two grades, accounting for 5.06% of all errors in the two grades. The number of errors in pronouns (pr) of freshmen is 96. The number of pronoun errors of sophomores is 43. There are 139 pronoun (pr) errors in the two grades, accounting for 4.09% of all errors in the two grades. The number of errors in preposition (pp) of freshmen is 12. The number of preposition errors of sophomores is 45. There are 57 errors in preposition (pp) made by students in the two grades, accounting for 1.68% of all errors in the two grades, and the amount of errors ranks eighth. The adverb (ad) ranks ninth in the error quantity. The number of errors in adverbs (ad) of freshmen is 17. The number of adverb errors of sophomores is 22. The students in the two grades made a total of 39 errors in adverbs (ad), accounting for 1.15% of all errors in the two grades. The total number of errors in adjectives (adj) in the two grades is 1, accounting for 0.03% of all errors in the two grades, and the number of errors ranks tenth. In addition, no errors in conjunctions (cj) were found in both grades, and no errors in adjectives (adj) were found in sophomores.
In addition, freshmen made a total of 1725 errors in 11 dimensions, accounting for 50.73% of all errors. Sophomores made a total of 1675 errors in these 11 dimensions, accounting for 49.26% of all errors.
The two graphs above compare the differences in the amount of writing errors between the two grades in two different ways. The bar chart in Figure 3 and the line graph in Figure 4 clearly show that sophomores made more errors than freshmen in seven dimensions: fm, wd, vp, np, cc, pp, and ad. However, in the three dimensions sn, pr, and adj, freshmen have more errors than sophomores. Among them, the amount of errors of freshmen students on sn was 149 higher than that of sophomores; the amount of errors of freshmen students on pr was 44 higher than that of sophomores; and the amount of errors of freshmen students on adj was 3.0 higher than that of sophomores.


In order to investigate the differences in writing errors among freshmen and sophomores at different levels, the top 30 students in each grade were classified as the high group and the last 30 students were classified as the low group. The analysis of the writing error data revealed that the distribution of the data met the normal distribution. Therefore, an independent sample S-test was conducted to investigate the differences in writing errors between the high and low subgroups of freshmen and sophomores (see Tables 4 and 5).
Table 4 shows that freshmen made zero errors on the conjunction (cj), while they made errors on the other ten dimensions. In addition, the means of the lower subgroups of freshmen were greater than the means of the higher subgroups on all of these dimensions. This indicates that the low subgroup of freshmen made more errors on the ten dimensions fm, wd, sn, vp, np, cc, pr, pp, adj, and ad than the high subgroup.
From the results of the independent samples S-test in Table 4, there is a significant difference between the freshman high and low subgroups on the seven dimensions of fm, wd, sn, vp, np, cc, and pr (Sig. (two-sided) < 0.05). This indicates that the number of errors in the freshman low group was much higher than that in the freshman high group in these seven dimensions. This also indicates that the amount of errors in the freshman low group is not significantly different from that in the freshman high group in these three dimensions.
The data in Table 5 show that the sophomores had zero errors in adjectives (adj) and conjunctions (cj) and some errors in all nine dimensions: fm, wd, sn, vp, np, cc, pr, pp, and ad. In addition, the means of the sophomore low subgroup students were greater than the means of the sophomore high subgroup on all nine dimensions. It means that, in all nine dimensions, the amount of errors of the sophomore low subgroup is more than that of the sophomore high subgroup.
From the results of the independent samples S-test in Table 5, there were significant differences (Sig. (two-sided) < 0.05) between the sophomore high and low subgroups of students on the eight dimensions of fm, wd, sn, vp, np, pr, cc, and pp. This indicates that the amount of errors in these eight dimensions was much higher in the sophomore low group than in the freshman high group. In other words, the amount of errors of the sophomore low group was not significantly different from that of the sophomore high group on the dimension of adverb (ad).
The Pearson correlation analysis leads to the data in Table 6. The specific explanation is as follows: the amount of writing errors in the freshman students’ composition sample was negatively correlated with the students’ composition scores, with a correlation coefficient |r| of −0.967; i.e., their correlation was extremely high. From the above, it is clear that the higher the amount of writing errors in the English composition sample of freshmen and sophomores, the lower their composition scores.
2.2. English Writing Recommendation Algorithm
The MFL recommendation algorithm has four steps.(1)The matrix of students’ ratings of English writing documents is considered as a complex network, where students and English writing documents are considered as network nodes and ratings are considered as network linkage weights. Using the linkage weights as a distinction, the network is divided into high-weight subnetworks and low-weight subnetworks.(2)Based on the improved LINE algorithm, the network structure of the high-weight subnetwork is learned and the student vector and the English writing literature vector are generated. The English writing literature vector generated by the high-weight subnetwork is used as the input of the low-weight network learning to learn the structure of the low-weight subnetwork and generate the student vector of the low-weight subnetwork.(3)From the whole network of nodes, student nodes with the same rating on English writing literature are randomly selected to form a sequence of student nodes. The sequence of student nodes is fed into the CBOW (continuous bag-of-words) algorithm to learn the features of similar students.(4)The three-dimensional feature vectors generated by each student node are linearly combined and stitched together to form the final student vector. The cosine distance of the vector is used as the similarity index between the nodes to generate the set of similar students of the target students. The English writing literature associated with similar students is recommended to the target students to complete the recommendation task.
As the input of the recommendation algorithm, the scoring matrix is usually composed of as records, where p represents the student number; x represents the English writing literature number; and n represents the student’s rating of the English writing literature. The students and the English writing literature constitute the nodes in the network. is the set of students; is the set of English writing documents; is the set of connected edges. When there is a link , it represents student 's score n on English writing literature . The network can be expressed as , where , , and M is the weight matrix on the connected edges. The network representation learning algorithm learns the network structure information and generates a low-dimensional vector representation of the network nodes. The similarity of the vectors is used as an indicator of student similarity, and the most similar Top-k students are selected. The English writing literature associated with the similar student set constitutes English writing literature recommendation set and is recommended to the target students.
The edges with more than half of the maximum weight of the connected edges of the network are extracted to generate the student favorite network (high-weight subnetwork). For example, for a rating network with a maximum rating of 5, any contiguous edges with a rating greater than or equal to 3 are extracted.
The English writing literature node vector is denoted by and the student node vector is denoted by .
For each edge <x, y>, using the Softmax function, the conditional probability of student node generating English writing literature node is as follows:where |Q| represents all nodes. There is a strong correlation between and , and the English writing literature node (qy) grows exponentially with the student node vector (px). The empirical distribution of student nodes generating English writing literature nodes is shown in where are the weights of the connected edges; is the degree of node ; and is the node adjacent to node . KL scatter is a function that measures the difference between two probability distributions. In this paper, KL is used to denote the degree of difference between the conditional probability and the empirical distribution . When the two distributions are the same, the KL scatter is zero, and the greater the difference between the two, the greater the KL scatter. Using the KL scatter, the loss function can be obtained.where denotes the importance of node . Here take , and finally get the objective function as follows:
In the process of loss optimization, the calculation of the conditional probability requires traversal of the entire network of nodes. In large-scale network structures, the process of calculation requires a lot of time and resources. To solve the problem of higher complexity, the traversal of the whole network nodes is replaced by the negative sampling method, and the loss function is transformed as follows:where and denote the English writing literature vector; denotes the student vector; ; Z is the number of negative samples; Z = 5 is generally chosen for large networks; and is the noise distribution. , where represents the degree of English writing literature nodes.
After iterative optimization, two types of vectors are generated. Φ1 and Φ′ represent the vector of students’ preferred features and the vector of English writing literature, respectively.
Extract the edges of the network with even edge weights less than half of the maximum weight to form a student-averse network, and learn the student-averse network with the objective function of learning as shown in where and denote English writing literature vectors; denotes the vector of student nodes that need to be relearned; and Z, is the same as the setting of preference feature learning to generate the student aversion feature vector Φ2.
The difference between the aversion network learning and the favorite network learning is mainly reflected in two points. First, in the representation learning of the student favorite network, no initialization settings are made for the student node and the English writing literature node. In the aversion network learning, the English writing literature node needs to be initialized and set. Its setting value is the English writing literature vector output by the favorite network, and the student node is not initialized. Second, during the training process, the preference network learns for all nodes, including the English writing literature node and the student node. During the learning process of the aversion network representation, the English writing literature vector is locked and only the student vector is learned. Algorithm 1 is described in Table 7.
Students with similar characteristics will have similar ratings for the same English writing literature. Inspired by the DeepWalk algorithm, students with the same rating on the same English writing literature are randomly selected. The sequence of randomly selected nodes is treated as statements in natural language processing, and the probability of occurrence of a particular student in a sequence is evaluated as the basis of this part of the algorithm. For a particular English writing literature node , students are randomly selected among students who have the same rating on it, forming a sampling sequence . It contains student nodes max being the maximum sequence length. In this paper, the maximum sequence length of randomly selected student sequences is set to max = 100. The actual sampling process may result in the situation that the number of available student nodes for sampling is too small. In order to avoid the resulting repeated training, the minimum value min is set in the sampling process, and the sampling is skipped when the students with the same rating of an English writing literature are less than min. In this paper, min = 10, and multiple groups of similar student sequences constitute a “corpus” for extracting network information.
The extracted sequences are fed into the natural language processing algorithm CBOW model. For a certain English writing document , the extracted sequence is . This paper hopes to maximize the probability by training the corpus. Each of these nodes can be represented by a low-dimensional vector, and subsequently maximizing the probability U can be converted to the following equation:
In the model of network representation learning, the final optimization objective function is transformed into
In order to reduce the computational effort, the subsequence consisting of the length of a window before and after the target word is selected as the input of the CBOW model during the actual training. In this paper, the window size = 40. The student feature vector Φ3 is generated through the learning of similar students, and the algorithm is described as shown in Table 8.
For a student node, three sets of vectors Φ1, Φ2, Φ3 will be generated, which represent student preference features, dislike features, and similar student features, respectively. The final low-dimensional vector of student node can be represented as a linear combination of Φ1, Φ2, Φ3:
Based on the experimental experience, α1, α2, and α3 were set to 0.5, 0.3, and 0.2, respectively. The final student vector Φ() was generated, and the cosine similarity was used as the interstudent similarity index Sim. The three most similar students were selected to form the similar student set S(p) of the target students.
Recommend all the associated English writing documents of the student set S(p) to the target students to complete the recommendation task.
3. Result Analysis and Discussion
3.1. Suggestions for Improving the Teaching of English Writing in College
Some suggestions are given here for the causes of the errors in English writing of college students.(1)Teachers should have the right attitude of error correction and writing guidance. To improve students’ English writing, teachers must thoroughly review and instruct students’ essays, to correct errors correctly and appropriately. If the error is due to carelessness, it may be caused by exam-like stress or carelessness, or it may be caused by the student’s psychological state at the time. In such cases, the student can correct the error on his or her own, or the teacher can correct it by prompting the student. However, errors arising from students’ incomplete understanding of the rules of English become an area where teachers must pay attention to helping students master the rules of the target language and correct writing errors.(2)Teachers can enrich the way they correct essays. There are certain principles that should be followed for essay correction, and a combination of correction modes is more conducive to improving students’ writing errors. For example, in practice, a combination of correction methods can be adopted. First of all, common correction symbols can be set, so that students understand the common methods of correction symbols and annotation methods. And the prerequisite for correcting essays is to standardize the use of deletions, additions, adjustments, changes, and other revision marks. Secondly, peer assessment, self-correction, group correction, revision, and teacher correction can all be ways of making corrections. This is about giving students some responsibility in making corrections so that they can speed up the process and mobilize their own subjective awareness.(3)Improve the teaching methods and strategies of English writing and set writing tasks reasonably. The results-based approach has had a profound impact on the teaching practice of English writing courses in China. The outcomes-based approach has been the “teacher’s model explanation-student’s imitation-teacher’s evaluation” method. Teachers should encourage students to use specific vocabulary and avoid using general, broad vocabulary to better improve the accuracy of English writing and phrasing and to make better use of basic English knowledge.
3.2. Experiments on English Writing Recommendation Algorithm
An article bank of English writing containing 100,000 scored data pieces from 943 students on 1,682 English writing documents provided by a university was used for the experiment. The dataset was randomly cut into 80% training set and 20% test set. MAE and root mean square error (RMSE) were used as the measures.
Experiment 1. Comparison of this paper’s algorithm with other algorithms’ MAE.
The proposed algorithm is compared with the other four algorithms in [21–24]. The recommendation effect of the five algorithms is based on the change of MAE, as shown in Figure 5.
It can be seen that the algorithm of [21] ignores the influence of student attribute characteristics on student trust. And [22] did not introduce the time factor into the similarity calculation, which led to poor recommendation quality. The algorithm in this paper incorporates the improved algorithm of [23] and the algorithm of [24], which significantly improves the recommendation effect and solves the problem of student and project cold start at the same time, so the recommendation accuracy is the highest.

Experiment 2. Comparison of the time performance of the proposed algorithm with other algorithms.
In order to better verify the time performance under different data sets, K is selected from 50 to 250, and the value is taken every 50 (see Table 9).
As shown in Table 9, [22] and [24] have the longest running time, which is due to the fact that they both improve on the traditional cosine similarity algorithm. The recommendation efficiency of [24] is less than that of [22] because the algorithm of [24] takes into account the changes in student interest level that occur with time offset. The recommendation efficiency of the algorithm in this paper is close to that of [21] and [23], and [21] dominates relative to the dataset selected in this paper. However, since the fusion algorithm incorporates [24] algorithm, it is longer than the algorithms of [23] and [21] in terms of running time. Overall, the proposed algorithm can meet the basic needs of students.
4. Conclusion
Since China’s reform and opening up, its communication with foreign countries has become more and more frequent. As the most widely used language in the world, English plays a crucial role in China’s communication with other countries. Therefore, English teaching and learning are very important. Writing ability is one of the most important and difficult to develop. The main way to improve college students’ English writing ability is to read a large amount of excellent English writing literature. However, based on the varying level of teachers, reading amount, and experience, it is impossible for reference recommendations to meet the writing needs of all students. How to meet the needs of students with different English levels, provide accurate and personalized reference books for each college student in a mass of references, and make real-time recommendation through online recommendation system is the top priority in universities. Therefore, this paper proposes a hybrid recommendation algorithm based on multidimensional feature representation learning (MFL). The algorithm split the English writing literature scoring network and, based on the improved LINE algorithm, carried out hierarchical advance learning for college students who like English writing literature and dislike English writing literature. Based on the improved DeepWalk algorithm, the similar student sequence is obtained and the similar student features are captured. After linear combination of liking features, dislike features, and similar students’ features, the final feature vector of students was connected, and cosine similarity was used as the similarity measurement index to achieve the recommendation of English writing literature. Experimental results show that the algorithm can not only take into account the multidimensional nature of students and English writing literature, but also improve the efficiency and effectiveness of recommendation. Nowadays, recommendation systems have been integrated with all aspects of life, work, and study, and the next research goal is to apply them to other disciplines or to trigger more comprehensive recommendations from the intersection of multiple disciplines.
Data Availability
The labeled dataset used to support the findings of this study can be obtained from the corresponding author upon request.
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
This study was supported by Zhengzhou Business University.