Abstract
The embedded English speech teaching recognition system is a technology that writes the English speech recognition device control program into the chip and embeds the chip into the device, so that the human-like chip controls the English speech device to complete the speech recognition operation. Applying the embedded technology to the English speech recognition system can improve the recognition accuracy of the system to a higher level in terms of recognizing the English speech of a specific person. The purpose of this paper is to research and design an automatic error detection method for embedded English speech teaching recognition system in the context of artificial intelligence. This paper first gives a general introduction to the overview of artificial intelligence, then analyzes the speech recognition algorithm, uses MatLab software to obtain the correct number of recognition system words and the correct rate, and then implements the embedded English teaching recognition system in different environments. The experimental results of comparison through multiple test analysis show that in a quiet environment, the error rate of the embedded English speech teaching recognition system is very low, and the correct recognition rate can reach more than 90%. In a noisy environment affected by various noises, the correct recognition rate of the embedded English speech teaching recognition system is basically above 60%.
1. Introduction
With the rapid development of computer and artificial intelligence technology, the application of intelligent mechanical equipment is becoming more and more extensive. As one of the key technologies of intelligent control, the ability of voice recognition control directly determines the degree of intelligence of mechanical equipment. The rapid development of artificial intelligence has promoted the rapid development of China’s education industry, making China’s English phonetic teaching more mature. In order to improve the teaching quality and teaching level, the English education industry actively introduces the embedded English pronunciation teaching system. The embedded English pronunciation teaching system has high English pronunciation recognition and strong control ability, and is suitable for popularization and application in the field of intelligent control. Because the embedded English pronunciation teaching system has high English pronunciation recognition and strong control ability. It is suitable for promotion and application in the field of intelligent control. In order to improve the teaching quality and teaching level, the English education industry has also actively introduced the embedded English pronunciation teaching system.
Today, with the rapid development of artificial intelligence technology, the implementation of embedded English phonetic teaching system teaching in the field of education can improve the overall teaching quality and teaching level, which also has far-reaching significance to the education field. The application range of artificial intelligence is very wide. In recent years, scholars have carried out research on English teaching in the context of artificial intelligence. However, there are relatively few research designs on the automatic error checking method of embedded English speech teaching recognition system. Therefore, this paper will carry out a research and design on the automatic detection method of the recognition error of the embedded English speech teaching recognition system under the background of artificial intelligence. This has both theoretical and practical significance.
With the advancement of science and technology in China, more and more people have conducted research on artificial intelligence English teaching. Among them, Misirov proposed that in the context of artificial intelligence, English pronunciation teaching in primary and secondary schools has changed a lot [1]. However, his lack of careful analysis of the data resulted in a less rigorous article. However, Vong and Kaewurai pointed out that the educational community can develop, implement, and evaluate a teaching model based on a cognitive approach. This enhanced students’ critical thinking and developed their ability to teach critical thinking to learners [2]. But his research is not based on real-world conditions, so the applicability is not high. Subsequently, Yin proposed to study the English embedded grammar-assisted teaching method and applied it to the design of the microclassroom model. The experimental results showed that embedded oral English teaching was in line with students’ academic conditions and teaching rules in senior high school English teaching [3]. However, the research and investigation he used was relatively widespread and cannot be specific to each regional teaching at present. Gashaw conducted acoustic analysis of speech recordings of four Amharic learners and two native English-speaking Canadians. He concluded that native samples of Amharic showed actual peaks on almost all words, requiring longer pronunciation times [4]. However, the examples he studied were too few, and the individuals were never representative of the group. Later, Chika et al. pointed out in his research that the use of multiple indicators for systematic phonetic evaluation was not only conducive to a deeper and broader understanding of Japanese and English, but also to the development of phonetics teaching [5]. However, his research did not elaborate on the elaboration of the speech system, so the article is not complete. After reviewing the above studies, Isaacs and Harding pointed out that the number of postgraduate students studying second language pronunciation and doing academic work in international universities had reached a record high. This offers broad prospects for second language pronunciation research and teacher training [6]. However, he did not cite strong data in the article to prove his theory, so the theory was not very successful. Later, Thompson showed in his research that through the analysis of key events in English classroom activities, he found that English dialogues can help improve English pronunciation [7]. But there are many articles on the subject similar to this kind of research, so his research is not novel enough.
The innovations of this paper are: In the context of artificial intelligence, it applies the embedded English pronunciation teaching system to English education research. In other research reports, the embedded English phonetic teaching system is often described as a new type of educational method. And this paper is devoted to dig deep into the research and design of the automatic error detection method of embedded English speech teaching recognition system, and test the speech recognition and processing ability of intelligent mechanical equipment. This has a certain reference value for the research of artificial intelligence-related technologies. This paper is devoted to the research and design of the automatic error detection method of embedded English speech teaching recognition system. It also uses MatLab software to conduct multiple sets of comparative tests on the recognition system to test the voice recognition and processing capabilities of intelligent mechanical devices. This has a certain reference value for the research of artificial intelligence-related technologies.
2. The Method of Automatic Detection of Errors in the Embedded English Teaching Recognition System
2.1. Overview of Artificial Intelligence
Artificial Intelligence’ English abbreviation is AI. Artificial intelligence means talking about how to use computers to mimic human brain activity, and brain activities, such as reasoning, proving, identifying, understanding, involving, learning, thinking, planning, and problem solving. And it uses it to deal with complex problems that people can only solve. This special brain activity was only attainable by humans before, but now AI shows the technical level that artificial intelligence can basically reach based on knowledge, which realizes artificial intelligence imitating human brain activity.. These technologies are acquired through human intelligence but do better than human intelligence. This is also the level standard that AI needs to achieve in the future development [8].
AI is a very popular subject, which breaks people’s traditional concepts. It has changed the common way of thinking and promoted the absorption of human knowledge and the development of human education. In the field of education, emerging educational technology has always provided a strong force for educational reform. Teaching work has also become more convenient and efficient because of artificial intelligence. AI makes education fairer, more accessible, and more democratized. Teaching software with artificial intelligence can see, hear, speak, learn, and understand and help users’ emotions or moods similar to humans. It allows users to communicate with computers naturally and smoothly through language, text, gestures, expressions, and other methods to achieve human–computer interaction. In other words, artificial intelligence, space technology, and atomic energy technology are known as the three major scientific and technological achievements of the twentieth century. Because of the changes brought about by the continuous deepening of technology in the field of modern education and teaching, artificial intelligence will be widely used in the field of education. And it will have a profound impact on educational philosophy, teaching process, and teaching management.
2.2. Embedded Speech Recognition Algorithms
2.2.1. Algorithm Structure Model
By consulting relevant literature, we can obtain the mathematical model of the algorithm and the corresponding mathematical model of the speech recognition algorithm [9]. Its structure is shown in Figure 1. It can be seen that the sound frequency enters from the impulse mode along with the noise. In order to ensure the successful conversion of the speech signal, the model must solve the output problem in the communication channel.

It can be drawn from the figure that the important function of this model is to ensure the output function K(M) required when the speech signal is converted and solved, such as
Then, the analog signal of voice pronunciation is changed into a digital signal that can be used for communication, and it needs to be preparatively processed, as shown in the following formula:
Assuming that Y(i) is the value compared with a certain time node m, then formula (3) can be obtained after preprocessing.
In the model, it first defines a set of state quantities at time point M, and the state of the model can only be one of these . At the starting point M = 0 time point, the initialization probability π is
It is followed by the transition probability matrix, so we get
But other time points except M = 0 are hidden, so the formula can be obtained as
Fitting it again, we get
Assuming that there are u English words, model is established for each word, and we can get
The final formula is
Here, the weighted combination of several Gaussian components of the speech signal is
The probability density function of a Gaussian mixture distribution containing n components is a weighted combination of n Gaussian probability density distribution functions, which is defined as
For the output probability of the input signal E in the SCHMM state K, the formula is
The probability of a sentence appearing in the model can also be expressed by the following formula:
Through the model, the optimal starting point formula for the pronunciation error collection of the embedded English speech recognition system can be obtained:
Through processing, the transfer formula can be obtained as
It decomposes the model and estimates the pronunciation vector to obtain the followingformula:
Then, it can be obtained by eigen decomposition:
The pronunciation feature spectrum A of the embedded English recognition system can be obtained by using the time-frequency analysis method to locate
Finally, by analyzing the tone of the speech recognition system, it can be concluded that the output feature quantity is
2.2.2. Two-Stage Recognition Algorithm
In the embedded English speech recognition system, a single word template is not enough because of its sensitivity, and the detection time and space occupied will increase due to the increase of the number of candidate words, so the waiting time will increase. Therefore, it is rarely used in the embedded English speech recognition system [10]. The Monophone model is significantly different. It has the advantages of convenient templates, less waiting states, high reading efficiency, less space occupied, and has nothing to do with the number of words read. Even so, it is not accurate enough in pronunciation-related descriptions, resulting in a low subsequent recognition rate. Triphone and Syllable models are relatively accurate in terms of pronunciation-related performance, but there are many models, many states, slow recognition speed, and large space occupation [11]. In order to deal with both the space occupancy and the reading speed, a two-stage search algorithm is used in this paper. The basic process is shown in Figure 2.

It can be seen that in the first stage of reading, the model and static recognition network used have obtained a lot of candidate word entries. In the second stage of reading, another recognition network is built according to the multiple candidate entries given in the previous stage, and another model is used to perform accurate reading and obtain the final recognition result [12]. Because the number of entries read in the latter stage is relatively small, compared with the recognition in the previous stage, it is found that the reading speed is very fast. At the same time, the second-stage reading can reuse the space resources of the first stage, which also reduces the memory footprint of the recognition system.
2.3. Design Requirements of Embedded English Speech Recognition System
2.3.1. System Hardware Part Design
By consulting the relevant literature, the hardware structure of the embedded English speech recognition control system can be obtained as shown in Figure 3.

By consulting the relevant information, it can be known that the chip used by the core processor of the system is very efficient in processing data. The execution of the dispatched instructions and the operation of the chip are enhanced by the instruction set in its architecture with increased efficiency and performance. The implementation efficiency of the issued instructions and the running performance of the chip are widely used in various industries [13].
2.3.2. The Principle of Embedded Speech Recognition Technology
According to previous researches, there are four ways of embedded speech recognition, which are based on vocal tract model, speech knowledge, pattern matching, statistical model and artificial neural network [14]. Pattern matching has four steps in embedded speech recognition: feature extraction, template training, template classification, and decision. Figure 4 is a block diagram of the pattern matching method:

In Figure 4, after passing through the radio facility, the speech is transformed into a signal flow. It is added at the input end of the recognition system, and after the preprocessing stage is completed, the features of the speech signal are extracted, and the required model is created on top of this. Such a process is called the training process [15]. Then it compares the prepared speech template with the read speech signal features through all the models of embedded speech recognition, and then finds the best model that is suitable for the read speech through the search and matching strategy. The whole process of adapting the newly obtained features to the standard template is called the identification process. Finally, through this template, the computer recognition result can be obtained by using the table look-up method.
2.3.3. The Importance of English Pronunciation Teaching Recognition System in English Pronunciation Teaching
Embedded English speech recognition system is one of the important technical projects of intelligent speech technology. It is a recognition artificial voice technology that can be achieved through specific technical means. It refers to allowing computers to recognize human speech and convert externally input text information into corresponding text information. In the classroom teaching of English listening, using the embedded English speech recognition system technology, teachers can choose appropriate and targeted teaching resources other than textbooks, and complete the input of the text information to be converted. It also completes the input of the text information to be converted, and the English voice can be converted into the corresponding text information by using the embedded English speech recognition system. As for the embedded English speech recognition system technology, it can be converted into corresponding text information. The audio pronunciation standard recognized by the embedded English speech recognition system technology can accurately present the English pronunciation and make the learners feel the pure English pronunciation. This creates a good English listening learning environment for learners, and solves the problems of non-standard and insufficient voice input in current listening teaching, and some teachers “dialect English.”
In addition, the multitimbral synthesis provided by the embedded English speech recognition system technology can meet the needs of various scenarios. This creates realistic situational teaching scenarios for learners. This can stimulate learners’ motivation to learn English listening and improve their learning initiative. It helps teachers to better carry out English listening teaching and helps to improve the quality of English listening teaching. Whether the quality of the English audio synthesized by the embedded English speech recognition system technology really reaches the level that can be used as a teaching demonstration sound in English listening teaching has not been confirmed by the data in previous scholars’-related research. Also, how to apply the functional advantages of the technology and the English listening teaching properly needs further research and analysis. Based on this, this research aims to use scientific research methods and means to analyze the practical application mode of embedded English speech recognition system technology. This not only stems from realistic demands but also has certain practical feasibility.
3. Simulation Experiment Analysis of Embedded English Teaching Recognition System Error Automatic Inspection
3.1. Embedded English Speech Recognition System Test
According to the actual demand, in order to test the sensitivity of the embedded English recognition system and test its error automatic checking function, we used MatLab software to conduct a simulation test. It is used to verify the systematic and accurate efficiency of the embedded English teaching recognition system [16]. Two experiments were carried out in this paper. In the speech recognition error accuracy test, 6 speeches were tested 30 times, and the results obtained are given in Table 1. From the first experiment in Table 1, it can be seen that the recognition rate of the embedded speech recognition system is relatively high, basically above 80%, which meets the design requirements of the intelligent recognition system. And within 30 tests, each word was correctly recognized more than 25 times. It can be shown that the performance of the system is very superior, the error value is low, and it is very suitable for English phonetic teaching research.
There may be insufficient rigor in an experiment. It uses MatLab software to carry out the second simulation experiment, which is the result of the second experiment, as presented in Table 2.
It can be seen from this that the difference between the results of the second experiment and the first experiment is not very big, and the accuracy of basically all words is above 80%. This well illustrates the feasibility of the embedded English speech recognition system, and proves the scientificity of the two experiments.
3.2. Test of Embedded English Speech Recognition System in Different Environments
In order to take into account the influence of various factors on the deviation value of the embedded English recognition system as much as possible, two experiments were carried out in quiet and noisy environments. It conducts 30 tests on the above six words to achieve the scientificity, accuracy, and authenticity of the experiment [17]. The tool used is also MatLab software. Figure 5 shows the results of the first experiment.

(a)

(b)
It can be seen from the figure that in a quiet environment, the embedded English speech teaching recognition system has a higher accuracy rate of recognizing words, basically reaching more than 80%. In a noisy environment, the embedded English speech teaching recognition system has a low accuracy rate of word recognition, but basically it has reached more than 66%. It can be known from this that if there are surrounding environmental factors, it will also lead to an increase in the error rate of the embedded English teaching recognition system. However, it is undeniable that the accuracy of the recognition system is still very high. And maybe in the future, the system will have better design and research to minimize the error value of the system.
A second experiment followed, in order to add more complexity and possibilities to the experiment, we added a different noise factor to the experiment in a noisy environment. And it was experimented with the previous fixed energy threshold method and two-stage detection method [18]. The results are given in Table 3.
It can be seen from the table that no matter in a quiet environment or under the influence of different noise factors, the two-stage detection method is obviously better than the endpoint detection method with a fixed energy threshold, and the recognition rate is higher [19]. And the two-stage detection method is also more scientific and accurate, which is very suitable for similar research.
Due to the long search time required for English speech, it is not possible to read the information after all the speech information has been received [20]. Because real-time reading is required, the system needs to implement reading features and first-stage recognition when acquiring voice signals. Because the vocabulary to be searched has not been completed, the final score against the vocabulary cannot be obtained. So, it is necessary to save the fitness score of each node for each vocabulary at the time of retrieval, which will bring another memory footprint [21]. For example, taking the three-column angle Gaussian model as an example, the voice database can be set as the command words of 10 girls. It can use these 10 command words to make a comparison of window width and recognition rate. The comparison of window width and recognition rate is depicted in Table 4. It is obvious that the window width data is 15, and the recognition rate is not reduced. Therefore, it can be concluded that the recognition rate is related to the candidate entries.
If these data are compared with the experiments of the first stage and the second stage, as given in Table 5. Compared with the first-stage reading, the second-stage identification uses a two-stage endpoint test method, feature selection algorithm, and beam search algorithm, which greatly increases the recognition rate and reduces space occupation and reading time [22].
Then, new data are obtained by comparing the identification system errors of different noise factors in a quiet environment. As shown in Figure 6.

(a)

(b)
It can be seen from the data graph obtained by the experiment that the error value of the recognition system in a quiet environment is the lowest, that is, the recognition rate is the highest. Under the influence of different noise factors, the error value of the recognition system is also different [23]. This shows that the impact of noise on the embedded recognition system is still a bit large. But even under the influence of different intensities of noise, the recognition system can still recognize words with an accuracy of more than 55%. It can be concluded that the feasibility of the embedded English speech recognition system in different environments is still worthy of optimism.
In order to increase the difficulty and reliability of the experiment, we believe that the recognition degree of only detecting words is not very comprehensive, so the error value of using the embedded English speech recognition system to recognize sentences is also added to this experiment. Similarly, the sentence experiment is also tested twice to ensure the accuracy of the experiment. The results obtained from the first test are shown in Figure 7.

(a)

(b)
It can be seen from Figure 7 that the recognition rate of the embedded English speech recognition system for sentences is still very high. The accuracy rate in quiet environments has also reached more than 80%, and each sentence has reached 25 correct times in 30 tests. This clearly demonstrates the power of the system’s performance. Although there is a slight difference with the recognition rate of words, the difference is not fundamentally different. This shows that the recognition system’s feasibility of automatic error detection on sentences is also very good.
The second experiment is carried out below, and different noise influencing factors are added to the experiment, and the comparison diagram is shown in Figure 8.

(a)

(b)
As can be seen from Figure 8, the embedded English speech recognition system is not very accurate for sentence recognition under the influence of different noise factors, and does not have a high recognition rate for each word. But it has basically reached more than 53%. And it was tested 30 times, and each time it reached the correct number of more than 15 times. This also corresponds to a ratio of the error value of the test intelligence system. Moreover, in this experiment, we tested the recognition rate of the system in a quiet environment. The results are obvious, in quiet environment, the system recognition rate is still very high. It has been repeatedly tested, which reflects the very good performance of the system in automatic error detection.
3.3. Automatic Error Detection Test Analysis of Different Identification Systems
Due to the design of the error method of the recognition system studied in this paper, it can be seen whether the system can accurately recognize English speech in practical applications [24]. We use six English voices to compare and test the previous speech recognition system and the embedded system we need to study in different environments, namely, a quiet environment and a noisy environment. It uses a different system for each English voice and it is in a different environment. Each test was conducted 20 times, and the number of times the system successfully recognized English speech was recorded. The test results are shown in Figures 9 and 10.


From the test results in Figures 9 and 10, it can be seen that the recognition rate of the embedded English teaching recognition system is significantly higher than that of the traditional recognition system. Through calculation, it can be obtained that the correct recognition rate of the embedded recognition system can reach 90%, while the traditional recognition system can only reach 60%. It shows that the performance of automatic error detection of embedded English teaching recognition system is still very good.
4. Discussion
This paper is devoted to researching and designing the automatic error detection method of embedded English speech teaching recognition system under the background of artificial intelligence, and applying it to analysis and processing. It not only expands the application scope of embedded English pronunciation teaching, but also is a new attempt to automatically detect errors in the embedded English pronunciation teaching recognition system. In different environments (respectively, a quiet environment and an environment affected by different noise factors), it analyzes the accurate recognition times and correct recognition rates of multiple groups of words and sentences. It also compares the recognition rates of different recognition systems, so as to mine and study the automatic error detection method of the embedded English speech teaching recognition system. This research method has certain potential in the research of complexity of intelligent system. This has certain potential in the study of the complexity of intelligent systems. For the research on automatic error detection method of embedded English speech teaching recognition system, this paper starts from the basic introduction of artificial intelligence. It analyzes embedded speech recognition algorithms and two-stage recognition algorithms as well as the design requirements of embedded speech systems. It successfully outlines the main structure of the embedded English speech teaching recognition system clearly. It then uses MatLab software in the experimental analysis stage to obtain multiple sets of speech recognition rate comparison combinations, and analyzes these combinations in two aspects in different environments and different systems. The results show that the obtained results are in line with the actual situation.
Through the analysis of this case, it is shown that the automatic error detection function of the embedded English speech teaching recognition system is very effective and the feasibility is very high. In the specific practice test, it can be found that the error value of the embedded English speech teaching recognition system in a quiet environment is very low, and the accuracy rate is very high.
This paper takes a case study of the recognition rate of embedded English speech teaching recognition system in different environments. Firstly, through the introduction of speech recognition algorithm, the structure of embedded English speech recognition system is obtained by consulting data. By comparing the recognition rates of different environments and analyzing the data, it is concluded that in a quiet environment, the feasibility of automatic error detection of the embedded English speech teaching recognition system is very high.
5. Conclusions
Under the influence of different environmental factors, the error values of the embedded English speech teaching recognition system are different. Among them, in a quiet environment, whether it is a word or a sentence recognition test, the system has the highest recognition rate. Under the influence of noise factors, it should be determined according to the type of noise and the size of decibels. But basically, the recognition rate of the system under the influence of noise will be a little lower than that in a quiet environment, but this is not absolute. After all, with the rapid development of artificial intelligence technology today, perhaps there will be better improvements to the embedded English speech teaching recognition system in the future, and the embedded English speech teaching recognition system can have better performance in identifying errors.
Data Availability
No data were used to support this study.
Conflicts of Interest
The author declares that there are no conflicts of interest in this study.
Acknowledgments
This work was supported by Scientific Research Fund of Hunan Provincial Education Department (the Project Number: 20C0294).