Abstract
In order to improve the effect of online interactive teaching of English, this paper analyzes the online interactive teaching data of English in combination with the generation of confrontation network and simulates the human interaction process to perform simulation. Moreover, this paper introduces the strategy gradient of reinforcement learning to the generator of the generation confrontation network to solve the problem that the generation confrontation network is difficult to be used for dialogue generation. In addition, this paper combines intelligent algorithms to construct an online interactive teaching system for English. The experimental research results show that the interactive English online teaching system based on the B/S model proposed in this paper has a greater advantage than the traditional online English teaching method.
1. Introduction
The fundamental characteristic of the network is openness, which is established on the basis of freedom and openness and breaks the limitations of time, space, and speed. Any computer can be connected to the Internet as long as it supports the TCP/IP protocol, and users can quickly and conveniently obtain information from various websites or information channels with the help of network applications such as tag aggregation and search engines [1]. Information sharing technology eliminates many intermediate links in information dissemination, truly realizes “space without barriers” and “information barriers,” realizes the exchange of needs, adjusts surpluses and deficiencies, and optimizes resource allocation. The Internet is a comprehensive information integration system. The content of English teaching information resources exists in the form of hypertext, and its hypermedia interface can be linked to a number of subject websites related to English courses through the Internet [2]. Therefore, English classes can make full use of the latest English teaching syllabus and ideas on the Internet, English teaching materials, many models of English teaching software, online courses, abundant course reference documents, course development tools and image materials, and the English teaching experience of first-line teachers. Moreover, it can use the overview of schools around the world, various educational policies, measures, research projects, online journals, printed materials, and various dynamic information and daily news, dynamic reports, meeting notices, etc., to expand knowledge. In addition, it comprehensively utilizes exquisite pictures, beautiful music, realistic animation, and video images and enriches the information content to comment and supplement the original knowledge. This makes the content and manifestation of English teaching resources of English class continue to be enriched, the structure of the resource system is gradually optimized and continuously improved, and secondary resources, tertiary resources, and fourth resources are gradually formed to realize the dynamic development of network information resources. Moreover, it realizes the co-construction and sharing within the region and the sharing and exchange between regions, which promotes the interaction within the English teaching system to the greatest extent, and minimizes the duplication of the construction of English teaching resources. Finally, it realizes the maximum utilization of limited English teaching information resources under the condition that the total input of educational resources remains unchanged [3].
The rapid popularization of interactive communication tools and software such as mobile communication tools, videoconference systems, and instant messaging software has truly realized diverse learning methods and ubiquitous learning. Using the videoconferencing English teaching system, one terminal to one terminal, one terminal to multiple terminals, and multiple terminals to multiple terminals can realize synchronous video interactive English teaching. Students can ask questions to the teacher through the message board, and the teacher will answer the questions according to each student. In a word, whether it is one-to-one interaction, one-to-many interaction, and many-to-many interaction or synchronous interaction, asynchronous interaction, and mixed interaction, there are various forms of network-interactive media support, which provide a variety of English teaching means.
This article combines intelligent algorithms to construct an interactive English online teaching system. The constructed system can not only realize the interaction between teachers and students but also realize the interaction between students and intelligent systems and improve the effect of English online teaching.
2. Related Work
Scholars still have a relatively clear definition of the concept of network, interaction, and teaching mode [3]. The characteristics of the network imply interaction, “interaction” refers to the phenomenon that the difference in the response amount between each level of a factor changes with the different levels of other factors, and teaching interaction is the interaction of teaching information between teaching and learning as well as learners. Literature [4] believes that the interactive teaching mode should highlight the student-centered and teacher-led teaching concept, and both teachers and students should give full play to their initiative and participation, create a variety of interactive situations, promote the active participation and investment of both sides, realize the mode of mutual promotion among teaching subjects, and jointly do a good job in teaching and learning. Reference [5] regards teaching mode as a comprehensive model and dynamic process. Teaching mode can make teaching theory concrete. Under the guidance of certain teaching ideas or teaching concepts, a relatively stable teaching activity program and activity structure form carried out in a specific environment is constructed to solve the problems of college students’ English education. The comprehensive theoretical model and practical method of educational goals, content, methods, means, mechanisms, etc. are a dynamic process composed of various teaching activity units connected together. Some scholars also regard the teaching mode as a plan or paradigm. Literature [6] believes that the teaching mode is a teacher-led teaching activity planning mode, which includes not only the activity process but also the selection of teaching materials and the setting of homework, etc., highlighting the teacher-led fixed teaching mode. Literature [7] believes that a model refers to a relatively stable teaching interaction framework system established under the guidance of a certain teaching theory and is a methodology for carrying out teaching activities. Although scholars have different understandings of teaching mode, they all believe that it is an intermediary for applying teaching theory to actual teaching activities or practice, with practical and operable functions, and is used by teachers and students. The Internet has opened up new channels and new positions for English teaching. The openness of the network enriches the teaching content, the interactivity of the network improves the teaching method, and the immediacy of the network enhances the teaching effect. Literature [8] mentioned that the network teaching mode has incomparable advantages over the traditional single teaching mode. The huge treasure trove of network resources can enrich teaching resources, network courseware can optimize teaching content, network interactivity can improve teaching methods, and network teaching can enhance the effectiveness of English teaching. Literature [9] believes that the interactivity and openness of the network expand the coverage of education for English teaching and enhance the teaching effect: the huge information storage of the network provides splendid materials for English teaching and greatly enriches the teaching activities. The powerful communication advantages of the network provide an open communication method for teachers and students to interact with English class teaching activities and become an important platform for college students to acquire knowledge and exchange ideas. The virtuality of the Internet causes some college students to be dehumanized, which has a negative impact on their communication and interpersonal relationships; the disorder of the Internet challenges college students’ ideological and moral concepts and legal awareness, leading to some college students’ behavior. This poses a huge challenge to the teaching objectives of English courses, which are responsible for cultivating qualified builders and reliable successors of the cause of socialism with Chinese characteristics. Reference [10] talks about the challenges of the Internet to teaching, and believes that the richness of the Internet enriched the content of English teaching, the interactivity of the Internet impacted one-way teaching methods, and the equality of the Internet diluted the authoritative status of teachers. In a word, the Internet also poses challenges to the teaching objectives, teaching contents, and teaching methods in the English teaching mode.
With the gradual modernization and humanization of teaching concepts, the teaching effect of English courses in colleges and universities has been significantly improved, but there are still some problems such as the lack of interest in learning by the main body of education, the teaching methods are still outdated, the teaching skills are still not skilled enough: the traditional teaching mode still dominates, and pedagogical interaction is minimal. Literature [11] mentioned that the English teaching effect is not good; the frequency and effect of the teacher-student interactive teaching mode are not satisfactory; the teacher-student interactive mode is convergent and lack of innovation. Reference [12] mentioned that in terms of teaching skills, many English teachers in colleges and universities have not mastered computer technology accurately; secondly, in terms of teaching mode, traditional one-way theoretical lectures still dominate, with teachers leading and students being passive. It has not changed; in the end, the effect of online teaching of English courses in colleges and universities is still unsatisfactory, and the ideal teaching effect has not been achieved. The reasons are that traditional teaching concepts are deeply rooted, and modern teaching models are difficult to accept; insufficient investment in educational infrastructure and lack of monitoring mechanisms have led to the superficiality of online teaching and no comprehensive and in-depth research; the reform of English teaching in colleges and universities focuses on forms. Lack of content is an important reason that makes it difficult to achieve good teaching effects [13].
3. Improved Dialogue Generation Algorithm Based on GAN Network
This paper analyzes the online interactive teaching data of English in combination with the generative confrontation network and simulates the human interaction process to perform simulation.
The generative adversarial network consists of two parts: generator and discriminator. The two parts learn by game, which is an unsupervised learning method. In the dialogue generation model, the generation model is used to capture the internal distribution of training samples and map questions to answers. The discriminator is used to determine the probability that a sentence is a human answer. Specifically, the generative model tries as much as possible to fake sentences similar to human answers to fool the discriminator, and the discriminator tries to identify whether the dialogue is a human answer or a forgery of the generative model.
We assume that the data set is x, the data distribution to be learned by the generator is , and the question z sent to the generator obeys a certain distribution . The generator model and the discriminator model can be expressed as respectively. Among them, and represent the network parameters of the generator and discriminator, respectively. The purpose of the GAN generator is to make the discriminator misjudge, which can be expressed as [14]
At the same time, the discriminator must correctly identify the source of the data, so the entire model is
The framework of the model is shown in Figure 1 [15].

Due to the discrete nature of words, the dialogue generation task cannot directly use formula (2) to train the model. In order to overcome this problem, this chapter will improve the generative adversarial network, using reinforcement learning strategy gradient to calculate the network error, which can be used for dialogue generation.
According to the reinforcement learning of the value function to continuously iterate the process of policy evaluation-policy update, this paper estimates the state function or state-action function and then finds the optimal strategy. However, this table-based reinforcement learning has certain limitations. For example, when the state space and action space are too large, the number of value functions increases exponentially, making it impossible to calculate. In addition, table-based reinforcement learning cannot handle situations where states and actions are continuous.
Different from the reinforcement learning based on the value function, by constructing the policy network, the reinforcement learning using the policy function can learn the strategy directly, instead of obtaining the optimal strategy through the estimated value function. That is, it can directly get the next action by inputting the agent’s state into the network. This chapter will use the Seq2Seq model as the policy network, the state is a question, and the action is the answer to the question. If we assume that the network parameters of the Seq2Seq model are , the policy is , the state is , and the action is a, then the policy network can be expressed as [16]
Since the parameters of the strategy function are already included in the network, the network parameters can be updated iteratively according to the gradient.
Among them, represents the optimization objective, and represents the gradient.
For the strategy function, if it is assumed that the probability of trajectory is , then the objective function is expressed as [17]
The derivative of the objective function is
can continue to decompose:
Based on the above formula, the final objective function gradient is
Compared with reinforcement learning based on value functions, reinforcement learning methods based on policy networks can not only process discrete state and action spaces but also high-dimensional discrete or continuous state and action spaces and are suitable for use as a dialogue generation model.
The reinforcement learning model is composed of (s, a, r, π, p) five-tuples, corresponding to state, action, reward, strategy, and state transition probability. For the questions in this chapter, the definitions of these 5 elements are as follows[18]:(1)The state s is the question x of the dialogue(2)Action a is to generate answer y under the condition of dialogue x(3)The return r is the probability that the discriminator correctly recognizes the answer y is the artificial generation(4)Strategy π is the action that should be taken under the condition of dialogue history x(5)State transition probability p is the probability of generating a reply y under the conditions of x and a
This article uses the Seq2Seq of the two-way GRU as the text generation model. In this model, the dialogue question x is input into the encoding end of the Seq2Seq model, and finally, the sentence is generated by the softmax function on the decoding end. As far as this problem is concerned, the Seq2Seq model is a strategy network. The encoder inputs the question x, and the decoder outputs the action a. Since the vocabulary of the text is discrete, it is impossible to directly update the generator parameters using the ordinary generative confrontation network training method. This article uses the policy gradient method to update.
The role of the discriminator is to identify whether the sentence is generated by machine or man-made, so it is a two classifier. For the problems in this chapter, logistic regression is used as the binary classifier. In this research, the skip-through vector sentence vector generation method is used, and the dialogue is encoded into a vector representation and then used as the input of the two classifier.
The skip-through vectors model uses an encoding-decoding framework in continuous text and borrows word2vect’s skip-through method to encode a sentence into a distributed representation based on the context of a sentence. Since similar sentences have similar semantics and grammar, the sentence vector after encoding is also similar. The model consists of 3 parts: an encoder and two decoders. Among them, encoder uses GRU unit instead of LSTM as RNN encoder. If it is assumed that the current sentence is and the contexts are and , respectively, then the entire sentence can be represented as by triples. The purpose of the skip-through vectors model is to use these triples to train a general encoder.
We assume that the last hidden layer of the coded sentence at the encoding end is , and the word of the sentence is . According to the encoding-decoding framework, the decoder uses maximum likelihood estimation to maximize the probability of generating the . The optimization objective can be expressed as [19]
Similarly, for the following , there are
Combining the above two formulas, the optimization goal of the entire model is
The skip-through model is shown in Figure 2.

After the model training is over, the output of the last hidden layer on the encoding end is the sentence vector representation. After that, the sentence vector is used as the input of the two classifier.
Since the values of the image pixels are continuously differentiable, the image can be directly used to generate the adversarial network training. However, the dialogue is composed of discrete words and cannot be directly used for training like images. In order to solve this problem, the policy gradient algorithm is introduced to update the network parameters. According to the reinforcement learning model settings in this chapter, the return of generating a complete sentence is , then the objective function is
Among them, is the generator network parameter, and in this research, it is the network parameter. represents the probability that the generator generates the word under the condition , which is the strategy in the reinforcement learning model. represents the action-value function; that is, the value generated by generating the vocabulary under the condition of the state .
The premise of using formula (12) to calculate the objective function is to calculate the value function . If the generator has generated a complete statement, the function can be expressed as [20]
According to formula (12), the value function depends on each action , and the discriminator can only judge whether a complete sentence is generated by machine or man-made after it is generated. Therefore, formula (13) cannot be used directly to calculate the action-value function of the incomplete sentence.
In order to calculate each state-action value function , this research uses an additional rollout policy to search for possible words through the Monte Carlo search algorithm. This article uses the softmax method to search the vocabulary space, and the formula is as follows:
If it is assumed that the complete sentence has words and the generator has generated words, the complete sentence completed by Monte Carlo search is
Among them, represents the number of Monte Carlo searches. For this research, the value of is 5. In summary, the value function can be expressed as
Among them, .
In the confrontation network, every time the discriminator model undergoes a judgment, the error must be passed to the generator model to update the parameters in the generator model network. According to formula (12) and the policy gradient update algorithm, there are
According to formula (19), after performing Monte Carlo search, we can get
The role of the discriminator is to identify as much as possible whether the answer is generated by a person, so in this research, can be defined as
Among them, represents the probability that the discriminator judges that the dialogue is artificially generated.
Combining formulas (16), (18), and (19), the final target gradient is obtained as
In order to maximize the return, the network parameters of the generator model adopt the gradient ascent algorithm, and the update method is
Since it is not easy to train the generated adversarial network, if formula (20) is used directly to update the network parameters of the generator model, the perplexity value will not be easy to converge after the model is trained for a period of time. This means that as the training time increases, there are more and more optional words for the generator model, and the performance of the generator model becomes lower and lower. The reason for this phenomenon is that the generator model directly updates the network parameters according to the reword returned by the discriminator model. However, this direct method will bring a series of problems: once the generator model training effect becomes poor, the discriminator model can be trained well so that the generator loses the update direction, and eventually the discriminator and the generator cannot converge synchronously, and the model cannot be trained.
In order to alleviate this problem, this research introduces supervised teacher guidance. The specific process is to use the artificial response as the generator input to update the generator model network parameters. The most intuitive way is to set the reword returned by the discriminator to 1.
4. Interactive English Online Teaching System Based on B/S Model
Based on the demand analysis and design goals of the abovementioned network-interactive English teaching platform, if the traditional B/S system design method is adopted, the amount of data information interacted between the client and the server will be relatively large. This not only puts pressure on the server but also reduces the user’s response speed and the utilization of network resources. The use of RIA technology can effectively reduce the amount of data information transmission between the client and the server and can also make full use of the client’s resources, effectively alleviating the heavy load pressure on the server. Through the use of Flex technology, the design of the presentation layer of the network-interactive English teaching platform can be realized. Flex has a rich and complete UI component, a model component for defining data, and a control component for communicating with the server. In addition, Flex also provides a powerful data verification method, and all of these functions are completed locally on the client, without the intervention of the server. The business logic layer of the platform is managed by Spring and uses Spring’s Inversion of Control (IoC) technology to create, configure, and manage Beans using BeanFactory. Developers do not need to write lengthy code to build the relationship between each Bean, and developers do not need to write code that has little to do with business logic, such as database transactions and log output. Spring’s AOP can separate this type of code from the application, reduce code coupling, and increase code reuse. Hibernate is used as the persistence layer of the platform, and the Hibernate framework encapsulates JDBC in a lightweight manner. When we use Hibernate to manipulate data, we do not need to write tedious JDBC code anymore. Instead, we use an object-oriented thinking model to add, delete, modify, and check data through the Session interface. The network-interactive English teaching platform is integrated with Flex, Spring, and Hibernate to realize the hierarchical development of Web applications. Figure 3 shows the overall architecture of the platform.

The functional structure of the network-interactive English teaching platform is shown in Figure 4.

The process of the network-interactive English teaching platform is shown in Figure 5.

A teacher can teach multiple courses, and a course can have multiple teachers. At the same time, a student can choose multiple courses, and a course corresponds to multiple different students. Moreover, a student can ask multiple questions and reply to multiple questions; a teacher can also discuss and solve the same problem multiple times. In addition, teachers can upload multiple multimedia video resources, and students can also download and upload multiple multimedia video resources. Finally, a major has multiple courses and a series of relationships. The specific relationship is shown in Figure 6.

The overall design of the platform mainly includes the framework of the interactive English teaching platform, the framework of the server and client systems, and the division of modules. First, this paper introduces the framework of the entire interactive English teaching platform and then introduces the module architecture of the interactive English teaching platform, including the server, classroom management client, and student client. Finally, this paper designs the various modules of the Android client in detail, and the system hardware deployment diagram is shown in Figure 7.

The client functional structure diagram is shown in Figure 8.

The interactive feature is mainly manifested as online teaching can provide flexible and content-rich teaching resources based on the teaching object. The online teaching environment not only requires advanced and complete hardware facilities and convenient control but also requires the smooth operation of supporting system software and requires the realization of interactive teaching system integration in classroom teaching. This article uses the Seewo software and hardware system online teaching built by the school as the core to construct a smart teaching environment from the three aspects of situation, task, and reality integration so as to broaden the teaching interaction channels and enhance the teaching interaction effect. The English online interactive teaching system is shown in Figure 9.

After obtaining the English online interactive teaching system, the effect of the system is verified. In this paper, the system performance analysis is carried out through experimental teaching method, and the system performance verification is carried out through teaching statistics method. Moreover, this paper compares the teaching method proposed in this paper with the traditional online teaching method. The results shown in Table 1 and Figure 10 are obtained.

From the above analysis, it can be seen that the interactive English online teaching system based on the B/S model proposed in this paper has greater advantages than traditional online English teaching methods and can effectively improve the effect of students’ online learning.
5. Conclusion
The interactivity of the network refers to users having more choice and autonomy when choosing information and also refers to the two-way relationship between information providers and recipients. The essence of English teaching is the interactive activities between teachers and students, and English teaching in English class is more about the exchange of ideas and emotional awareness between educators and educators to achieve a consensus of value. Therefore, the interactive features of the Internet cater to the essential needs of English teaching and are an extension of classroom English teaching. This article combines intelligent algorithms to construct an English online interactive teaching system. The constructed system can not only realize the interaction between teachers and students but also realize the interaction between students and intelligent systems and improve the effect of English online teaching. The experimental research results show that the interactive English online teaching system based on the B/S model proposed in this paper has greater advantages than traditional online English teaching methods and can effectively improve students’ online learning effects.
Data Availability
The labeled data set used to support the findings of this study is available from the corresponding author upon request.
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
This study was sponsored by Hankou University.