Abstract
Deep generative technology has seen a lot of success in the speech and image fields, thanks to its vigorous development and widespread application. The goal of this paper is to use the improved image restoration and depth generation algorithms to improve the English intelligent translation teaching system. This will make a bigger difference in the classroom when it comes to teaching intelligent translation in English. To address the problem of data noise, this paper proposes a data augmentation method that can efficiently exploit large-scale monolingual data in semi-supervised scenarios, as well as a data augmentation method that exploits the robustness of statistical machine translation in unsupervised scenarios. The CNN-Dueling-DQN-NMT model outperforms the CNN machine translation baseline model on the WMT14 English-French dataset by 1.92. The BLEU value of the Transformer machine translation baseline model is improved by 1.63 when using the Transformer-Dueling-DQN-NMT model. With a BLEU value of 44.63, the Transformer-Dueling-DQN-NMT model performs best.
1. Introduction
NN-based translation methods are based on modeling the translation process using continuous vector representations in neural network (NN). The NN has a strong fitting ability, so the translation model can automatically learn the knowledge in the bilingual parallel corpus. It is worth pointing out that the new NN translation method will also face new problems and challenges that require further in-depth research, for example, the interpretability and robustness of the neural network structure, the problem of repeated translation and missing translation, and the problem of good fluency. This requires researchers to continue in-depth research.
The neural machine translation (NMT) utilizes currently emerging deep learning techniques. It extracts the features of text vocabulary by building a deep and complex NN and uses end-to-end NN technology to achieve intelligent conversion from one natural language to another. In terms of the theoretical value, research on machine translation plays a benchmarking role in natural language processing, and research on machine translation can drive the development of other fields. The progress and development of NMT technology research has strongly promoted the vigorous development of tasks in other fields (such as sentiment analysis, text classification, and dialogue generation) of natural language processing. And different from other research topics using deep learning technology, machine translation can be said to be a subject at the level of human cognition. Its in-depth research on it is also the promotion of the combination of cognitive science and AI science.
The innovation of this article is as follows: (1) This paper proposes a scene-universal data augmentation method. It combines the low-frequency word replacement method and the reverse translation method and adds a grammar error correction module. It achieves effective data enhancement in both resource-rich and low-resource scenarios. (2) On the basis of multi-granularity feature fusion as input, this paper adds dynamic word vector embedding and conducts a comparative experiment with static word vector embedding. (3) It proposes a data augmentation method for unsupervised neural machine translation that utilizes the robustness of statistical machine translation to alleviate the problem of data noise.
2. Related Work
Starting from the influence of cultural context on Chinese-English translation, Zhang discussed the context of Chinese-English translation, understanding and practicing translation activities from the perspective of cultural translation, and practical experience [1]. The teachers play an important role in the educational process. Balla has attempted to highlight some of the most important roles that English teachers play in the challenging teaching process. The concept of the ideal teacher, in his opinion, does not fit into a single concept because many factors must be considered. English teachers must take ownership of the subject and encourage students to participate voluntarily. He should not only be knowledgeable about the subject, but also be able to interpret it [2]. Embedded software development and early testing are greatly aided by virtual platforms. Lora proposed a “middle meat” approach to virtualizing heterogeneous systems [3]. The network of NMT encoders and decoders iterates on the network of word multi-semantic encoding, according to Choi’s research. According to the context defined in the source sentence, he clarified the source and destination words. The context takes a lot of energy to create. In addition, special points (numbers, correct nouns, acronyms, and so on) should be entered to facilitate the translation of words that are not suitable for continuous vector translation [4]. According to Wu’s recent research, grammar knowledge can significantly improve NMT performance. The target translation as well as each dependency tree are built and modeled together in this case. The tree structure is used as a framework in decoding to make word creation easier. Finally, to create an NMT dependency model and implement a Transformer dependency-based framework, a grammar encoder is used to extend the dependency sequence [5]. Miura’s triangulation method, which combines source-intensive and centralized target translation models into a single target source model and is known for its high translation accuracy [6], combines source-intensive and centralized target translation models into a single target source model. According to Choi’s research, neuronal machine translation (NMT) has emerged as a new type of machine translation, with the attention mechanism serving as the primary method [7]. Kim proposes a neural network (NN) architecture for detecting statistics that combines surface windows and syntactic context in a single-language syntactic word representation. Two language pairs and two tasks are used to test the method: detecting grammatical errors and predicting the entire task after processing. His proposed neural network [8–10] (NN) architecture is forward-looking, but it still has a lot of room for improvement.
3. Improved Image Restoration Algorithm and Depth Generation Algorithm
3.1. The Teaching Environment of English Intelligent Translation
The intelligent translation teaching constructed in this study consists of elements such as students, learning communities, teachers, and educational resources. Smart classroom is an educational system that integrates educational software and hardware, diagnosis, analysis, and other services by using software and hardware, network, and other technical means. The intelligent translation classroom includes computers, HiTeach interactive learning system, low-focus projector, Haboard interactive whiteboard, physical reminders, HiTA intelligent teaching materials, and IRS instant feedback device learning software. The English intelligent translation teaching environment is shown in Figure 1.

As shown in Figure 1, the HiTeach interactive teaching system can be installed on the teacher’s computer to realize functions such as selection, synchronization, and response. Haboard interactive whiteboards are useful for education with touch-controlled teaching assistants. Physical reminders can display student learning outcomes, such as homework. HiTA intelligent teaching assistant makes it easy for teachers to take pictures, upload pictures, and synchronize courses. The IRS instant feedback system includes many remote controls for students. Recipients can use the IRS direct feedback device to answer multiple-choice questions to computers and screens. Teachers can receive information from students in a timely manner. To sum up, the research-based intelligent educational environment provides a variety of technical conditions for English listening and speaking teaching. It includes voice environment, differentiated interaction, and timely feedback [11]. The interactive mode of English translation listening and speaking teaching is shown in Figure 2.

The relevant research on English translation education is summarized in Figure 2. The importance of a multimedia environment is emphasized in English listening and speaking education, as is the importance of a standard environment for pronunciation, feedback, reaction, and differentiation. To improve students’ listening and speaking abilities, English translation instruction should focus on the pronunciation environment, feedback, reflection, and differentiated evaluation [12]. Supporting and regulating students’ pronunciation, providing feedback data, providing a basis for reflection and evaluation, and providing technical conditions for optimizing English listening and speaking education are all functions of the intelligent educational environment.
3.2. NN Machine Translation
In recent years, self-attention networks [13] have attracted much attention due to their flexibility in parallel computing and modeling. Current neural network machine translation models use stacked self-attention and fully connected layers for the entire model part. The output matrix calculation formula is
Among them, refers to the scaling factor of the dot product, and the self-attention mechanism and multi-head attention mechanism (MHAM) are shown in Figure 3.

As shown in Figure 3, the Transformer model has three attention mechanisms. It includes an encoder MHAM, a decoder masked MHAM, and an encoder-decoder MHAM [14]. The MHAM can be calculated by the following formula:
It thinks from a probabilistic point of view, and a given sequence of input sentences to be translated is X=. The goal of the Transformer is to generate the target translation according to the conditional probability defined by the NN:
where = consists of the first i-1 words of the sequence , and represents the length of the sequence. The standard decoding algorithm adopted by Transformer is BeamSearch [15]. That is, at each time step i, the following formula is used to obtain the translation probability, and finally best translation candidates are retained:
In order to scale the similarity value in the [0, 1] interval, it uses the edit distance to calculate the fuzzy matching (Fuzzy Match (FM)) coefficient between two sentences. Its formula is as follows:
Among them, represents the edit distance between strings s and t and represents the number of elements. A larger fuzzy matching coefficient indicates a greater degree of similarity between two sentences, and the fuzzy matching coefficient is between 0 and 1 [16]. According to the FM value, the bi-sentence pair with the highest degree of similarity can be selected from the translation memory:
where represents the size of the vector . The larger the EM value, the greater the semantic similarity between strings and [17]. In practice, there is a situation where two sentences are not very identical in terms of basic units, but semantically describe the same thing. Therefore, using the semantic similarity calculation method can pick out these similar sentences, but using the string-based method cannot. Therefore, in practical applications, the similarity calculation method needs to be selected flexibly.
For the source language sentence to be translated, it first retrieves a set of source language sentences and corresponding target language translations from translation memory using an off-the-shelf search engine. It then obtains the translation memory list . Then, it calculates the similarity of X sum according to the following formula:
Second, translation fragments are collected from translation memory lists [18]. It collects translation fragments (accumulating up to 4-grams) from the retrieved target sentences as possible translation fragments of X [19, 20]. Translated fragments from translation memory are represented as
where represents all n-grams collected from ( accumulates up to 4).
Then, a weighted score is calculated for each segment u∈. The weighted score value of segment is based on the similarity between the TM source language sentence and the input source language sentence. It measures the likelihood that the segment belongs to the source language sentence to be translated. The larger the value, the more likely it is to be a correctly translated fragment [21]. Specifically, the final weighted score of each u is calculated by the following formula:
Therefore, the translation segment reward will be calculated according to the following formula. It is then added to the output layer of the NNMT model:
Among them, is obtained from debugging on the development set, and is calculated by the following formula:
Finally, in the output layer of the NNMT model, the updated translation probability of the words in the translation vocabulary is
In summary, in the method of using translation fragments to guide NNMT decoding, it gives an additional reward when decoding the output for the words contained in the translation fragments collected from translation memory.
3.3. Position-Sensitive Translation Memory Fusion Methods
To capture contextual information or long-range knowledge, a normal distribution is employed to represent the relationship between locations [22]. In this paper, the most similar translation memory instance <> is used to learn the word position distribution parameters at the sentence level. Specifically, for target word and translation target position during decoding, the sentence-level corresponding position score is calculated by the following formula:
In the formula, refers to the position of the word, where ∈ , represents the fuzzy matching (FM) value of with . Then, the following formula is used to calculate the sentence-level position reward value:
where (cond, val) is calculated according to the following rules: If cond is true, (·) takes the value val; otherwise, it takes the value 0. In this way, the NNMT model captures sentence-level positional information. It allows the source language sentence to obtain more contextual information of the translation segment at each decoding time.
The fragment-level location information helps the NNMT model to further capture local information. Similar to the reward sentence-level location information above, it is the word in the collected translation segment . The reward value for its segment-level position () is calculated using a simple standard normal distribution:
Therefore, it uses the following formula to calculate the additional fragment-level position reward value:
In summary, at each decoding instant i, the translation probabilities of the vocabulary in the output layer are updated. This increases the output probability of those words that match the expected position:
It uses to represent the translation probability of word . The word-to-word transition probability (e.g., from to ) is calculated by the following formula:
where denotes all cases satisfying ()∈u.
Therefore, the value that word should reward can be calculated by the following formula:
Second, for the position chain in the double-chain graph, in each decoding step, the reward value is calculated according to the algorithm. Then, the updated reward value is calculated according to the following formula:
where represents the position of the word in the translation memory target language and represents the current decoding time.
4. Design and Implementation of Translation Teaching System
The continuous development of NMT meets the needs of social progress. The only way to make NMT technology better serve human beings and provide convenience and create value for people’s lives is to put it into practice and realize the real technology landing.
4.1. System Architecture
In order to put the theoretical method proposed in this paper into practice, and to ensure that the system can be easily updated and maintained after the completion of the system, the system strictly complies with the requirements of functional modularity in the design stage. To achieve maximum decoupling between functions, each module is organized in a hierarchical order to facilitate functional collaboration between modules. The overall architecture of the translation teaching system is shown in Figure 4.

As shown in Figure 4, the core service layer is in the middle. It is the core service logic of the entire machine translation system. The core service layer saves the most recent machine translation model as well as model-related configuration files like the trained word vector model and vocabulary and can use the model to generate translation results for the lower layer. The service layer can parse out the sentences to be translated and process requests from the interaction layer. At this stage, it also filters the request content. It will return a corresponding response for invalid requests, such as empty requests, or illegal requests, such as input in languages other than Chinese. The service layer will then schedule tasks in a reasonable manner based on the server’s resources. It completes the request by implementing the content translation as quickly as possible.
4.2. Functional Modules
The overall architecture design of the system mainly provides conceptual guidance for the realization of the translation system. Functional modularity plays a crucial role in the specific implementation details of the translation system. This system divides the different levels of the system architecture into specific functional modules from the perspective of being convenient for users to use and for system administrators to maintain and upgrade the system. The clear division of functional modules makes the translation system more convenient in both the early development stage and the later maintenance stage. The basic principle of the task buffer queue and the first-come-first-served strategy in the task scheduling module is shown in Figure 5.

As shown in Figure 5, such a mechanism avoids the phenomenon of server crash caused by insufficient resources due to the excessively high number of simultaneous requests on the server side. Reasonably setting the size of the task buffer queue is beneficial to improve the resource utilization of the server and reduce the average request response time of the client. The cooperation logic between the modules is shown in Figure 6.

As shown in Figure 6, this module is responsible for uploading a copy of the latest model trained by the model training module to the core service layer for the translation service. The internal functional modules are clearly divided. It is also organized hierarchically for interaction and collaboration between modules.
Due to the lack of supervised information, pseudo-training data produced by unsupervised NMT models suffer from a lot of noise and low-frequency translation errors. And these errors are continuously amplified and reinforced during unsupervised training. To solve this problem, statistical machine translation is introduced as a posterior regularizer to denoise the pseudo-training data. This allows these errors to be eliminated in time to improve translation model performance. The initialization process of the translation teaching model is shown in Figure 7.

As shown in Figure 7, the whole training process is mainly divided into two stages: model initialization and using statistical machine translation as the posterior regularization. In the first stage, language pairs X and Y are given. It first builds a bidirectional initial statistical machine translation model using a language model trained on monolingual data and a translation table inferred from cross-lingual word vectors. Its statistical machine translation model will then be used for translation of monolingual data. In this way, pseudo-training data can be generated to initiate a bidirectional NMT model. In the second stage, statistical machine translation and NMT models are iteratively updated in a unified EM training framework. In this iterative process, the NMT model is trained not only with the pseudo data generated by the statistical machine translation model, but also with the pseudo data generated by the reverse NMT model translating the monolingual data.
5. Experimental Analysis of Intelligent Translation Teaching System
5.1. Transformer Model Parameter Settings
It applies reinforcement learning algorithms in an end-to-end neural network machine translation system based on the Transformer model. The parameter settings of the Transformer model are shown in Table 1:
As shown in Table 1, the Adam optimizer is used, and the β coefficients are selected (0.9, 0.98). This experiment is trained using 3000 tokens as a batch. Each sentence has a maximum of 1024 words, which are discarded in the decoder and encoder hold parts. To preserve the temporal information of the sentence, it embeds the position into the input of the encoder.
5.2. Performance Comparison of Reinforcement Learning Algorithms
This chapter introduces the idea of reinforcement learning and deep reinforcement learning algorithms into an end-to-end NNMT architecture. It has built ten end-to-end NNMT systems based on reinforcement learning algorithms based on CNN and Transformer models, respectively. The performance of different reinforcement learning methods in the machine translation model is shown in Table 2.
Machine translation systems that use reinforcement learning outperform baseline systems using CNN and Transformer models, as shown in Table 2. The performance of a reinforcement learning machine translation system based on the Transformer model outperforms that of a CNN-based reinforcement learning machine translation system. The Transformer model-based deep reinforcement learning machine translation system has the best performance. The CNN-Dueling-DQN-NMT model improves the BLEU value of the CNN machine translation baseline model by 2.28 on the IWSLT2014 German-English dataset. The BLEU value of the Transformer machine translation baseline model is improved by 1.31 using the Transformer-Dueling DQN-NMT model. With a BLEU value of 34.68, the Transformer-Dueling DQN-NMT model has the best performance. The CNN-DDQN-NMT model improves the BLEU value by 1.57 over the CNN machine translation baseline model on the WMT14 English-German dataset. Over the Transformer machine translation baseline model, the Transformer-Dueling-DQN-NMT model improves the BLEU value by 1.04. With a BLEU value of 30.46, the Transformer-Dueling-DQN-NMT model has the best performance. The CNN-Dueling-DQN-NMT model improves the BLEU value by 1.92 over the CNN machine translation baseline model on the WMT14 English-French dataset. The BLEU value of the Transformer machine translation baseline model is improved by 1.63 using the Transformer-Dueling-DQN-NMT model.
5.3. Deviation Analysis of the System
It can be seen that the reinforcement learning machine translation model based on the Transformer also has a negative reward value in the early stage of training. Compared with the CNN-based reinforcement learning machine translation model, the Transformer-based reinforcement learning machine translation model has a smaller change in the loss value and better convergence. The reinforcement learning model based on CNN and Transformer is shown in Figure 8.

(a) Analysis of loss value of CNN-based reinforcement learning model

(b) Loss value analysis diagram of Transformer-based reinforcement learning model
It can be seen from Figure 8 that in the early stage of training, the machine translation model based on CNN reinforcement learning will have a negative value when calculating the reward value, causing a large deviation and making the overall convergence of the model worse. The model stabilizes over time. At the same time, during the experiment, the loss function of the model was recorded as the process of decreasing with the training batch. The Transformer model and Transformer+ part of speech information model are shown in Figure 9.

(a) Loss function diagram of Transformer model

(b) Loss function diagram of Transformer + part of speech information model
As shown in Figure 9, on the training set, the loss function of the Transformer model with the part-of-speech information vector is slightly faster than the Transformer model without the part-of-speech vector. The final convergence effect is also slightly better than the former. This also shows that the training effect of the Transformer model after adding the part-of-speech vector is better than the original model.
6. Conclusions
Because AI has a significant impact on both modern education informatization and the balanced development of education in the information society, finding applications should begin with theoretical and practical education. In light of the current challenges, this paper is oriented to the field of NMT, focusing on two aspects of data sparseness and model improvement. This paper proposes a combined method to address the limitations of existing data augmentation methods. The results of the experiments show that the combined method can effectively increase the training corpus, thereby improving translation task performance. This paper builds an NMT model with multi-granularity features and dynamic word vector embedding to improve model performance. Both multi-granularity feature input and dynamic word vector embedding can improve the performance of the translation model, according to ablation experiments, and the combination of the two has the best effect. The number of training times and the size of the training data are insufficient. Due to hardware limitations, it is unable to select training data from a large amount of data. It is impossible to set too large training times, data dimensions, or other parameters. This may have an impact on the improvement of modelability.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The author does not have any possible conflict of interest.
Acknowledgments
This study was supported by the Science and Technology Agency of Henan Province, China, 2020: A Study on the Effectiveness Evaluation and Promotion of Foreign Talents Introduction of Henan Province (NO. 202400410367).