Abstract

Since China is not a native English-speaking country, Chinese English learners generally have poor English application ability. Therefore, this paper aims to study the combination of virtual reality and English learning to create an immersive English learning system, which provides new solutions for students’ daily English to relieve from foreign language anxiety and improve their English application ability. This paper proposes that starting from the research needs, this research has developed a VR English learning system for scene-based teaching. This system can bring visual, auditory, and tactile multisensing experiences to daily English users’ learning. The user is fully immersed in the virtual scene, and the feasibility of the system is increased by data demonstration through actual teaching experiments. The experimental results of this paper show that 90% of the learners are satisfied with the learning effect in the virtual simulation learning situation, and more than 80% of the learners believe that using the avatar learning method to learn English in the virtual simulation learning situation improves the ability of all aspects and the relationship between teachers and students. In addition, 85% of the learners believe that they can improve their interest in daily English learning in the virtual simulation learning situation. Therefore, the feasibility of the system is high.

1. Introduction

In China, the majority of English learners are good readers and writers, but their speaking skills are very poor. Due to the lack of opportunities to practice, Chinese English is still in a state of “dumb English” for a long time. As a language, English has lost its most important function, that is, oral communication. This situation has also attracted the attention of the educational circle, and brought hope to improve this situation by reforming English teaching and strengthening the oral test. Nevertheless, the phenomenon is difficult to achieve good results in a short period of time due to the slow progress of the reform. With the popularity of virtual reality devices, more and more research studies try to use virtual reality (VR) to create an immersive English learning environment. So, students’ daily English situational teaching becomes very important.

In a theoretical sense, this study verifies the feasibility and effectiveness of “VR + language learning” and supplements the current research gap in the field of VR English education. At present, the research on the effect of VR on the daily use of English teaching is more based on the effect of language ability (such as vocabulary and sentence patterns) and only a few analyses have been made from the perspective of learning experience and emotion. This study discusses the foreign language learning experience and oral anxiety, and uses a controlled experiment to prove the effectiveness of VR, which also provides new ideas for students’ daily English situational teaching and education. In a practical sense, this research provides guidance for the design and application of “VR + language learning.”

The innovation of this paper is as follows: (1) This research has carried out a practical exploration of “VR + English” product design, conducted research on English education theory and VR technical characteristics, and carried out product design and research and development according to the research conclusions. (2) This study attempts to explore the learning experience and learning effect of immersive VR English through empirical experiments and draw conclusions through quantitative analysis and explore the reasons behind those conclusions through qualitative analysis. (3) To explore the influence of scene factors in VR teaching on oral anxiety, and to demonstrate this effect through a controlled experiment.

Studies evaluating the impact of hypermedia on second language acquisition are mainly applied to interfaces where user input is limited to clicking, entering text, and speech. Vieiramonteiro and Ribeiro explored the potential of virtual reality in foreign language vocabulary teaching, an exploratory study presenting the readiness and results of a study of 16 undergraduate English students and 9 students in a private course [1]. There are few studies on computer-based stent definition guidelines designed for IVR. Bacca-Acosta et al. provided some advice on how to design scaffolds for IVR environments for the teaching of English as a foreign language (EFL) [2]. Although his study participants could move in a virtual space and in an environment where the participant remained seated, it was not determined which brackets were better suited for learning in IVR. Despite the many benefits of using virtual reality (VR) in education, several challenges and limitations have contributed to the uselessness or misuse of this technology. Baniasadi et al. developed and used VR with educational and therapeutic goals, and its main challenges were divided into general and specific. General challenges include reducing face-to-face communication, education, cost challenges, user attitudes, and specific challenges [3]. Although virtual reality technology is widely used in other fields, it has little contribution to the field of education.

Due to the lack of an authentic English practice environment, EFL students generally have little opportunity to communicate in English with others, let alone get feedback from others for reflection. Chien et al. have developed a spherical video-based virtual reality (SVVR) environment to place students in a real English-speaking environment. The experimental results show that it has more positive effects on learners’ English speaking, learning motivation, and critical thinking ability, while reducing their learning ability [4]. Traditional textbook use and English teaching often fail to engage learners. Yang et al. developed a three-dimensional learning system, virtual reality English for living (VRLE), which provides learners with a realistic environment and promotes the development of communicative competence. Although his study collected multiple data sources for quantitative and qualitative data analysis of VRLE, the experimental results were not significant [5]. Digital learning has become an inevitable trend with many benefits for student performance and motivation. In Chen and Liao’s study, panoramic image VR (PIVR) is a low-cost alternative to traditional VR in classroom settings. The experimental group found that compared with the control group, the performance, motivation, and satisfaction of using PIVR were high [6]. However, in this study, the students in the experimental group did not perform significantly better than the students in the control group at the same level. Shijian conducted a research on the stratified education of higher vocational English based on virtual reality environment. A graded teaching model of comprehensive English teaching based on virtual environment was established, aiming to improve students’ English achievement and intercultural communication ability [7]. Although the level of achievement of students has improved, there is a lack of interpersonal communication.

3. Application Method of Virtual Reality Technology in English Learning

3.1. Problems Existing in English Learning

It is widely believed that weak oral expression is due to poor oral skills. Learners should master grammar, vocabulary, application methods, and communication skills to improve oral fluency and contextual coherence [8]. However, many English teachers find that even though they spend a lot of time teaching students speaking skills, some students still make the same mistakes in speaking. With the development of experiential teaching, people gradually realize that this is caused by foreign language anxiety. The vast majority of research proves that anxiety can negatively impact foreign language learning and performance. If the problem of oral language anxiety is not solved in time, learners will easily fall into a vicious circle of “the more anxious, the worse the performance and the worse the performance, the more anxious” [9]. Therefore, foreign language teaching should not only focus on the improvement of language skills but also on students’ language anxiety. Unfortunately, although most teachers are aware of the dangers of language anxiety, they have not adopted effective strategies to solve this problem. Foreign language classrooms still focus on the teaching of language knowledge rather than guiding students through cognitive or emotional strategies [10].

3.2. Virtual Reality Technology

The concept of VR technology has the following meanings. First, from the point of view of the simulated environment, it is a three-dimensional computer image generated according to the real-time change of the viewpoint of the person. In addition to 3D vision, it also includes 3D perception such as hearing, smell, and touch etc. [11, 12]. The application field of virtual reality technology is shown in Figure 1.

As shown in Figure 1, VR technology needs to simulate all human senses theoretically, so that users can perceive the virtual world more realistically [13]. Finally, from an interaction point of view, it is necessary to ensure the real-time feedback of the VR system. Because of the real-time feedback of human sensations, such as mind rotation and gestures, the user can naturally converse with the environment within the system and with other users [14].

3.3. Importance of Virtual Reality Applications in English Learning

Differences in individual learners and learning content determine the diversity of autonomous learning forms. Students should not only conduct individual learning according to their own characteristics but also carry out various forms of teacher-student or student-student cooperative learning to solve problems encountered in learning in the process of negotiation and conversation [15]. In the virtual environment, under the guidance of teachers, students can enter the corresponding virtual language community to learn corresponding language materials according to their own goals, interests, and abilities. It is also possible to complete the multiperson communication tasks set by the virtual language community with learners of the same level and the same interests and goals. In the virtual environment, when learners encounter difficult tasks, they can coordinate their behaviors with other users through dialogs and solve problems together, thus providing opportunities for various real-time collaborative learning activities [16]. In addition, learners can follow the teacher to learn in a virtual environment and learn foreign languages through collaborative practice. In such a learning process, students observe and communicate with each other, learn collaboratively, and make progress together. The verbal expression of the individual learner in the virtual language community is simultaneously reviewed and evaluated by other partners, which will create a collective positive external atmosphere. This greatly stimulates the learners’ desire for performance and curiosity and achieves the purpose of actively constructing knowledge.

Virtual reality creates a virtual environment that is presented to the senses, allowing users to have an immersive experience. Through VR headsets, learners can be completely isolated from the external physical world and fully immersed in a virtual world [17]. This technology has brought inspiration to evaluate the possibility to create an English learning system through VR equipment to provide learners with a fully immersive English environment. It allows learners enter this virtual world, as if they came to a pure English-speaking country, so as to improve their learning interest and learning experience. The classification of 3I features and immersion of virtual reality is shown in Figure 2.

As shown in Figure 2, the VR learning environment includes the following multiple human intelligence essences: oral/language, logic/mathematics, hearing, space, movement, interpersonal relationships, and self-cognition. Learners forget that this is still a learning environment, and they output language naturally when stimulated by the environment [18]. The simulated environment provided by VR is more secure, private, and neutral, and learners do not have to worry about making mistakes or losing face and can dare to speak. The immersive educational capability of VR technology and its experience have brought significant changes to the way we teach, think, and do things, and this new form of education will occupy an important position in future education because of its advantages [19].

3.4. GJK Algorithm

The GJK algorithm is an iterative algorithm based on simplex. By calculating the vertex sets of two objects, the output of the algorithm becomes the Euclidean distance between the convex bodies of the set. Based on the comparison of the distance with zero, it is determined whether a collision has occurred. However, the biggest feature of the GJK algorithm is that it does not perform any direct operation calculation on the vertex set of the input object, but calculates the distance between the two objects by calculating the distance between the origin of the Minkowski gap between the two objects. Thus, the problem of calculating the distance between two convex bodies is transformed into the problem of solving the minimum distance from a convex body set to the origin [20].

Assuming two convex bodies A and B, the distance between A, B is d(A, B). Then d(A, B) can be expressed by the following formula:

The GJK algorithm can also return the two closest points a and b between two objects, satisfying the following:

If is defined as a point in the convex body set C with the smallest distance from the origin, that is,  ∈ C and satisfy the following formula:

Then, the distance between A and B can be expressed as the Minkowski difference:

3.5. Fast Continuous Collision Detection Algorithm

This section proposes the FCCD algorithm based on the GJK algorithm for collision detection. First, calculate the shortest distance between two objects in a time area, and compare the distance with zero. If the distance is less than zero, it can be obtained that there is a collision between the objects [21]. Next, the method of intersecting the ray and the convex body is used to calculate where the two objects collide, that is, to determine the specific position of the collision. In summary, the FCCD algorithm consists of two steps. The first step is whether to collide, and the calculation determines whether there is a collision between objects. When it is determined that a collision occurs, the second step of operation is performed, that is, the collision response, and the specific position of the collision is calculated.

3.5.1. Detect Collision Calculation

The traditional algorithm is to calculate the distance between objects running linearly in two-dimensional space. After some improvement in this section, it is extended to collision detection between two convex bodies in three-dimensional space.

3.5.2. Relevant Description of the Collision Detection Problem

FCCD is aimed at scenes containing multiple moving objects. Suppose that A and B are two objects in the scene that are moving in a straight line at a uniform speed, and their speeds are and , respectively, within a period of time [].

In the time interval [], the position vertices of CH(A) and CH(B) are shown in formulas (5) and (6).

Suppose that CH(C) is the CSO of a relatively moving convex body CH(A) and a relatively stationary convex body CH(B), within the time interval []. It can be obtained from formulas (4) and (5), and the position vertex in CH(C) is shown in formula (6).

According to formula (7), it can be obtained that within this time interval [], CH(C) moves in a straight line, and the motion trajectory diagram is shown in Figure 3.

As shown in Figure 3, if the two objects do not intersect, that is, the distance between the objects is greater than zero, then the CSO of the two objects must not contain the origin. At this time, the distance between objects is equivalent to the closest distance from the origin to the motion trajectory of the CSO boundary. If the two objects intersect, that is, the distance is less than or equal to zero, then the origin should be included in the CSO of the two objects, and the origin should be included in the CSO boundary orbit.

3.5.3. Detection Process

The purpose of collision detection is to calculate the minimum distance between two objects. By applying the GJK algorithm, the problem of solving the distance of objects is transformed into solving the minimum distance between the CSO of two objects and the origin. Therefore, the distance between the two objects is further converted into the distance between the origin of the solution and the motion trajectory of the CSO boundary. The distance calculation formula is shown in formulas (8), (9), and (10).

3.5.4. Calculation of Collision Response

In this section, the second step of the FCCD algorithm, that is, the collision response, is performed only when the first step of the FCCD algorithm returns where the two objects collide. The collision response is to determine the specific position where the two objects collide, in order to avoid the phenomenon of penetration.

In a period of time [], the displacement vectors of A and B are T1 and T2, respectively. It can be obtained from formulas (5) and (6), and the expressions of T1 and T2 are shown in formulas (11) and (12).

In order to simplify the calculation process and speed up the calculation speed, it is assumed that the object B is stationary, relative to the object A. From this, it can be obtained that the relative displacement vector of the object A is T = T1 − T2. From formulas (11) and (12),

By definition, C = A − B. Therefore, at time , ; at time , . And because , it can be obtained:

From formula (14), it can be concluded that within a period of time [], CH(C) is also in relative motion, which is equal to the relative displacement vector of object A.

The above analysis shows that CH(C) is in relative motion and has a certain speed of motion. The origin velocity is zero and is stationary. Assuming that the velocity of CH(C) is zero, it is a static state, which can be calculated. Relative to CH(C), the origin is in a moving state, and the relative displacement vector is equal to −T. The state transition diagram of origin and CH(C) is shown in Figure 4.

According to the conversion diagram in Figure 4, it can be calculated that the position of A is equal to the initial time i when the collision occurs.

3.6. Implementation of Phase Recovery Algorithm

The light intensity distribution received by the camera can be expressed as follows:

In the formula, b = 0, 1, 2,…,M − 1, Lb(m, n) is the light intensity distribution of the target object collected by the camera. c(m, n) is the background light intensity distribution, (m, n) is the local contrast of the fringes, x0 is the carrier frequency, and φ(m, n) is the phase factor containing the depth information of the object. The surface height information of the measured object can be obtained by solving φ(m, n) in the phase function. The φ(m, n) in the phase distribution on the surface of the measured object can be solved, that is,

Using the 4-step phase shift method, the phase shift changes are 0, , , and in turn. From formula (15), φ(m, n) can be obtained as follows:

The phase function calculated by (17) is indistinguishable in fringe patterns of the same period, and the calculated phase value is folded, and its monotonous increase cannot be guaranteed. Therefore, it is necessary to unwind and recover the folded phase to solve the continuously changing phase. This process is called phase unwrapping or phase reconstruction. The generalized time phase unwrapping algorithm is used to solve the unwrapped phase diagram of the measured object [22].

According to the relationship between the phase value and the fringe frequency,

In the formula, and represent the two fringe frequencies, respectively, and and represent the unwrapped phases of the two frequencies, respectively. According to the relationship between the phase value and the fringe frequency,

Therefore, using the unfolding phase with the number of stripes , combined with the above linear relationship and the folding phase with the number of stripes and then unfolding to obtain the following:

4. English Learning System Based on Virtual Reality

This section will give a brief introduction to the overall system architecture and implementation content. This system is based on unity and can be connected to various VR devices, including HTC Vive and SamsungGearVR. It can also support traditional devices such as PC and mobile phones.

4.1. System Architecture

The whole system architecture of the English learning system based on virtual reality consists of three layers: the external presentation layer (presentation layer), the internal intermediate logical layer (logic layer), and the bottom data layer (data layer). This three-tier architecture can help build a flexible and scalable system. The detailed introduction of the three layers is shown in Figure 5.

As shown in Figure 5, the system also provides rich teaching methods such as documents and videos, and learners can also show their own videos and documents to others. The social module is used to manage the social interaction and interaction between each learner, including friend list and friend information. There are some other modules, such as VR device management, sound control, synchronization control, and other modules, which are not listed in the figure.

4.2. Objects Used by Virtual Reality English Learning System

The target learners of the system are Chinese English learners who are over 16 years old, with poor oral and communication skills, and who hope to further improve their speaking skills, including current students, job seekers, and students who plan to study abroad. According to the European standards for language testing and the Common European Framework of Reference for Languages (CEFR), most learners in China have an English level between A1 and B1. For the majority of Chinese current students, research shows that their English fluency is actually far below the requirements of English textbooks. Their English textbooks are rich and complex in content, but students’ oral communication and practical application skills are generally poor, and students need more oral training to achieve the desired level. And this is precisely what the English learning environment lacks. This system can provide these learners with an immersive English-speaking training environment. The immersive oral English training environment can simulate various real-life scenarios, such as job hunting, speech, and visa application training, to improve their oral communication and application level.

4.3. Immersive Learning Experience

The immersive learning experience is the biggest feature of this system. The interesting, intelligent, interactive, and encouraging environment will greatly improve the learner’s learning interest and enthusiasm, in order to achieve better results. This section will introduce several key factors used to increase the immersion experience in this system.

4.3.1. Simulation Virtual Environment

The main scene of the system is a virtual town where many learners can coappear in the scene and communicate with each other. There are several AI characters in the scene. When the learner approaches the AI character, the AI character will actively greet the learner. There are various places and buildings in the scene, such as schools, banks, and squares. There will be different AI characters in different scenarios, providing dialogs in various scenarios. Learners can interact with AI characters in corresponding scenes according to their own needs. In addition, all learners in the scene are synchronized, and each learner can also see the dynamics of other learners.

4.3.2. Specific Scene Training

The system has designed two specific application scenarios for learners to refer to and learn, including:

Lecture: The scene is set in a lecture hall. When the learner enters the scene, a screencast in the scene will automatically play a demonstration video of the lecture. The learner can learn and familiarize oneself with the lecture content in advance by playing, pausing, replaying, and doing other operations. The speech scene has two modes, namely, training mode and fluency mode. In the training mode, after the learner completes the learning of the demonstration video, a subtitle board will appear in the scene, where the subtitles of each sentence of the speech will be displayed on the subtitle board, and the example audio of each sentence can be played. The learner can repeat the practice sentence by sentence. The fluent mode provides a more immersive speaking experience. In this mode, the learner cannot practice sentence by sentence, but needs to fluently complete the entire speech. And in the fluent mode, there will be many AI audiences in the scene, and they will give feedback according to the learners’ speech effects by clapping, nodding, and smiling.

Interview: The setting of this scenario is that the learner is going to study in the United States, and so, goes to the embassy to apply for a student visa and conduct an interview. The whole training is carried out in the interview room. The learner will face an AI interviewer. The interviewer will ask the learner some details about the visa and studying abroad, and the learner needs to make a correct response. In this scenario, there are also two modes for the learner to choose from. During the training process, the AI interviewer will respond differently according to the different responses of the learners.

4.4. System Implementation

The whole system is implemented based on Unity3D under Windows 10, and the functions and control scripts are written in C#. The system is made more flexible and reliable by using a modular design and layered framework. Considering that as a learning system, for each learner’s different situation, tailor-made personalized courses can be carried out. If people need to code from scratch every time to customize a course, it will take time and effort, and will also generate a lot of redundant code. Therefore, the system separates specific functions from program logic, and the encapsulation of function modules and interfaces can be invoked through corresponding scripts and configuration files. This scripting mechanism makes the system scalable and easy to maintain. Under this mechanism, each course training in the system is an XML script file. This allows new training scenarios to be quickly created by writing new XML configuration files without coding from scratch. The logical script module consists of a parser and an executor, and the script parser is responsible for parsing the XML script configuration file. After the parser converts the script into an executable data format, it is executed by the script executor. The system creates a virtual environment for multiple learners, enabling real-time communication through speech, text, and body movements. Among multiple learners, documents can be shared, videos can be watched simultaneously, and whiteboards can be drawn. The system implements synchronization between multiple clients to ensure they have the same data such as speech, environment, position, body movements, and facial/lip animations. The system treats each environment as a separate channel, and synchronization is only performed within the same channel. Multiple learners using different VR devices are online at the same time through the network as shown in Figure 6.

As shown in Figure 6, the system supports multiple platforms, including desktop and mobile. On the desktop, the HTC Vive is used for experimentation and is only available to high-end users due to its high cost. Therefore, desktop VR devices are more suitable to be deployed in English training institutions and used as teaching and practice platforms. In the test, the mobile platform was experimented with SamsungGearVR and Samsung mobile phones. For most English learners, mobile-based VR headsets are affordable and easy to use, and with mobile devices, learners can learn English anytime, anywhere.

5. Experiment of Virtual Reality English Learning System

5.1. Effect of Virtual Reality English Learning System on Relieving Learning Anxiety
5.1.1. Subject Information

The subjects are required to be undergraduates from the first to the third year of a university (still in college English courses) and have successfully passed the CET-4 test. This experiment is made through online registration, and the subjects need to go to the experimental site at the appointed time and conduct a 30-minute paid experiment. All the participants meet the test requirements, they have the same educational and cultural background, and they are all native Chinese speakers.

5.1.2. Quantitative Analysis

In this experiment, the completeness and authenticity of the questionnaire were checked, and the questionnaires with incomplete answers and obvious response tendency were excluded. SPSS software was used for statistical analysis of the data in this experiment. The analysis results were integrated, and the reliability of each questionnaire is shown in Table 1.

As shown in Table 1, on the whole, the overall Cronbach’s alpha reliability coefficient of FLCAS is 0.954, and the overall Cronbach’s alpha reliability coefficient of STAI-S is 0.921, the overall Cronbach’s alpha reliability coefficient of the experience analysis scale was 0.932, indicating that the three questionnaires had quite high reliability in this experiment and could be used for further analysis. In addition, the experiential factor scale, it is divided into the following three factor types: perceived effectiveness (1-4 items), perceived ease of use (5-7 items), and behavioral intention (8-11 items). Therefore, this paper also reports the reliability analysis of its three factors, as shown in Table 2.

As shown in Table 2, the three factors tested have high reliability in this experiment and can be used for further analysis.

5.1.3. Difference Analysis Based on Experience Factor Scores

Difference analysis based on experience factors scores is carried out in order to specifically understand which experience factors the subjects have significant differences in and thus summarize the differences between VR and computer desktop learning. Considering the nature of the experiential factor score questionnaire, this paper makes a horizontal comparison of the average scores of each question in different groups. The details of the experience factor score are shown in Figure 7.

As shown in Figure 7, the experimental group was significantly better than the desktop group in both “perceived effectiveness” and “behavioral intention” scores. In terms of perceived ease of use, there was no significant difference between the experimental group and the control group. To sum up, there is basically no difference in the evaluation of the ease of operation of different learning systems. However, in the evaluation of “perceived effectiveness” and “behavioral intention,” the students in the experimental group rated the VR system significantly higher than the students in the control group rated the computer desktop system. It can be considered that the VR English learning system can bring better perceived learning effect to students, make users have a stronger sense of self-efficacy, and attract users to continue to use. The scores of experiential factors in the experimental group (N = 28) and the control group (N = 27) were analyzed and compared as shown in Table 3.

As shown in Table 3, the mean value of STAI-S in the experimental group was 35.04, and the standard deviation was 7.22. The mean value of STAI-S in the control group was 41.04 and the standard deviation was 10.42. It can be seen that the data of the two groups did not deviate significantly from the norm, and the mean value of STAI-S in the experimental group was 14.7% lower than that in the control group. Two-sample equal variance t test was performed on the two groups of data, and the t value was −2.49, the value was 0.0159, which was less than the standard value of 0.05. Therefore, it can be considered that the STAI-S value of the experimental group is significantly lower than that of the control group, that is, the null hypothesis does not hold. Table 4 shows the STAI-S score t-test analysis of the high anxiety group.

As shown in Table 4, for the high anxiety group, the average STAI-S score of the experimental group (N = 16) was 36.00, and the standard deviation was 8.52. The mean score of the subjects in the control group (N = 14) was 44.93 with a standard deviation of 10.05. It can be seen that the mean value of STAI-S in the experimental group is 19.9% lower than that in the control group. A two-sample equal variance t test was performed on the two groups of data, and the t value was −2.63, the value was 0.0136, which was less than the standard value of 0.05. Table 5 shows the STAI-S score t-test analysis of the low anxiety group.

As shown in Table 5, for the low anxiety group, the average STAI-S score of the experimental group (N = 12) was 33.75, and the standard deviation was 5.07. The mean score of the subjects in the control group (N = 13) was 36.85 with a standard deviation of 9.43. For those with lower FLCAS scores, the anxiety of the experimental group decreased but not significantly compared with the control group during the experiment.

5.1.4. Anxiety Analysis and Summary

Combined with the research questions and analysis of this study, it can be considered that, compared with traditional computer desktop learning, the VR scene-based learning system in this study can help subjects reduce speaking anxiety. And, for people who usually feel higher anxiety about learning English, this reduction effect is more significant. For people with low anxiety about learning English at ordinary times, the experimental group has a lowering effect than the control group, but it is not significant.

It can be considered that people with high FLCAS anxiety have a stronger appeal to scene-based learning because their anxiety during the oral English test (whether in normal times or in this experiment) may stem from the sense of disconnection between English learning and actual use. The use of VR system can well achieve the effect of scene, thus reducing the disconnection between usual learning and actual use, thereby reducing their anxiety. People with low FLCAS anxiety have better adaptability. For this group, desktop computer teaching and VR scene-based learning are almost the same for them, and VR scene-based learning does not bring more significant effects. People will further demonstrate this conclusion through the interview records of the questionnaire in the next qualitative analysis.

5.2. Effect of Virtual Reality English Learning System

A total of 20 questionnaires were distributed and 20 questionnaires were recovered, of which 19 were valid questionnaires, with an effective recovery rate of 95%. The survey results are discussed further.

5.2.1. In terms of Learning Attitude

The study attitude survey section contains a total of 6 subquestions, and the survey results are shown in Figure 8.

As shown in Figure 8, in the questionnaire survey on learning attitudes, the survey results show that the number of options A (completely possible) and B (mostly possible) in questions 1 to 5 accounted for more than 75% of the respondents who answered the questionnaire effectively. This shows that most of the learners have a relatively correct attitude toward learning English knowledge by using the avatar learning method in the virtual simulation learning situation. In question 6, the ratio of choosing AB is 60%. This shows that in the virtual simulation learning situation, learners are more willing to set their own pace and carry out autonomous learning with the prompt and help of teachers, which is also more in line with the characteristics of learning in the network environment.

5.2.2. In terms of Learning Effect

The learning effect investigation part contains a total of 5 subquestions, and the investigation results are shown in Figure 9.

As shown in Figure 9, in the part of the learning effect questionnaire, the survey results show that 90% of the learners are satisfied with their learning effect in the virtual simulation learning situation. More than 80% of the learners believe that using the avatar learning method to learn English knowledge in the virtual simulation learning situation can improve their writing and oral expression skills, as well as promote the relationship with their classmates and teachers. In addition, 85% of the learners believe that they can improve their learning interest in the virtual simulation learning situation. These data show that the design of virtual learning simulation scenarios is in line with the cognitive characteristics of learners, and has been recognized by everyone. The overall design is relatively successful.

5.2.3. Learning Situation Design

The learning situation design part contains a total of 6 subquestions, and the survey results are shown in Figure 10.

As shown in Figure 10, in the questionnaire part of learning situation design, the survey results show that most learners (85%) believe that well-designed and reasonable virtual learning situations have a greater impact on the acquisition of English knowledge. In the virtual simulation learning situation, the avatar learning method is used for teaching, and the presentation of teaching content and the organization of meaningful teaching activities have a great impact on the effective learning of learners. In addition, the selection of communication tools in the virtual simulation learning situation, especially the selection of voice communication tools, cannot be ignored. It is conducive to the exchange of information and practice exercises for learners.

6. Conclusions

This research excavates theories related to learning experience in language teaching, and uses VR technology to realize these learning experiences, seeking a design combination method of “VR + English.” Through practical teaching experiments, this study verifies the good experience that VR English teaching brings to students’ daily English situational use, and analyzes the advantages and possible disadvantages of VR technology in English teaching. These analyses have reference value for future students’ daily English situational use teaching system design. In addition, this study verifies that VR scenes can actually reduce anxiety in foreign language speaking, which also reflects the advantages of learning experience brought by VR. VR transforms “transmission of textual knowledge” into “transmission of experience,” and continues to use knowledge in students’ daily English situations. This is the unique advantage of VR experiential teaching. In the future, more teaching researchers will shift from focusing on “learning achievement” to focusing on “learning experience.” Future research can further expand from a single scene to more scenes, and conduct more detailed experiments on the design elements within the scene to explore the impact of different factors on the learning experience and effect.

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the 2020 Annual Program of Philosophy and Social Science of Henan Province “Online Class Discourse in the Context of epidemic prevention and control” (Project No: 2020BJY001) and the 2021 Academic Degrees & Graduate Education Reform Project of Henan Province “Research on Cultivation of Professional Practice Ability of MTCSOL-Taking Anyang Normal University for Example” (Project No: 2021SJGLX223Y).