Abstract
In order to develop more reasonable and scientific teaching equipment and software and improve the diversity of teaching process, LOGO image recognition technology is used to build a development environment in experiment, with Unity 3D as the development platform and Vuforia AR SDK as the development tool. LOGO image recognition technology is applied to the case study of high school inquiry teaching. By the way of classroom teaching practice and teacher-student interview, the application effect of LOGO image recognition technology in teaching is evaluated, and the effectiveness of this technology in inquiry teaching practice is verified by practical data. The results show that compared with YOLO (You Only Look Once), R-CNN (Region-CNN), and Faster R-CNN, image recognition algorithms based on deep learning theory, LOGO image recognition ability is stronger and the result is more accurate. Using this teaching mode can significantly improve students’ academic performance, and this method is correct, reasonable, and scientific. The application of LOGO image recognition technology based on deep learning in teaching can provide research ideas for the combination of AI and education.
1. Introduction
With the continuous progress of society in recent years, more and more image recognition techniques have been developed, and the most common is to match different algorithms according to the principle of feature extraction (SIFT features, HOG features, and LBP features). According to the gradient of each pixel in each region or the characteristics of the edge histogram, image recognition can be realized [1–3]. Secondly, using the neural network method and the VGGNet network structure and selecting smaller network convolution kernel can significantly improve the expression ability of the network [4]. Microsoft proposed the ResNet algorithm, and the residual learning structure can be used to solve the problem of shallow network gradient dispersion in deep network neural network, so that deep learning can be more widely used in image recognition [5]. The LOGO image recognition technology study has also been reported in many articles; among them, Tang and Peng proposed that better matching results could be achieved by using unique topological constraint algorithms and feature selection methods [6]; Yuxin and Peifeng used convolutional neural network (CNN) to design a highway entrance vehicle sign recognition system for real-time detection and classification of vehicle signs [7]; Shou-xian and Fang used LOGO system that could better identify the logo in life, in the premise of certain accuracy, and could guarantee real-time performance [8]; and the proposed CNN method based on transfer learning can effectively identify vehicle information [9]. The application of this image recognition technology is mostly used in intelligent transportation system, but not in teaching research.
In the field of education, the teaching of knowledge is mostly the principle of direct teaching, and the students are indoctrinated by language [10, 11]. In the foreign teaching model, the inquiry learning method is often used, that is, to produce association from real life, to use it in teaching, and to improve students’ ability and interest in autonomous learning by creating situations [12–14]. The inquiry teaching method study has also been reported in many articles; among them, Aktamiş et al. found that compared with traditional teaching methods, inquiry learning methods could significantly improve students’ academic performance in science education [15]; Andrini found that inquiry teaching methods not only cultivated students’ learning intelligence and ability but also improved students’ potential emotions and skills [16]; and Zhai found that educational philosophy and teaching model-based inquiry had obvious advantages in college English teaching and could significantly improve students’ comprehensive ability and professional quality [17]. The current teaching methods have changed from the original indoctrination to diversified learning methods [18]. It is a hot topic in modern teaching to apply new technology in teaching to improve students’ cognitive ability.
Data on LOGO image recognition technology is collected through literature research, and the feasibility of LOGO image recognition technology in teaching is analyzed. The advantages of this technology are studied by combing the requirements, development process, module training, etc. of this technology in teaching application and by comparing it with other algorithms. The purpose is to apply the LOGO image recognition technology supported by machine learning (ML) to the teaching and to conduct a field investigation on the application effect, which will provide theoretical support and practical basis for the further integration of image recognition technology and educational measures.
2. Methods
2.1. Design of LOGO Image Recognition Teaching Mode Based on Deep Learning
(1)Demand analysis: the traditional shallow learning (focusing on the interpretation of knowledge points, theoretical derivation, students often through imitation to complete the more independent, and discrete action skills) had not satisfied the requirements of the times, so in the teaching design, learners’ deep learning ability of “adaption, innovation” should be focused on the cultivation(2)Flow chart: as shown in Figure 1, the flow chart of teaching pattern LOGO image recognition based on deep learning was divided into three parts. The first part was that students could take picture and upload pictures of interest; the second part was the training of brand LOGO recognition, which was carried out in the way of network computing and divided into three levels: the shared network algorithm module, the long short-term memory neural network module, and the biasing algorithm module; and the third part was the output of the results; both the teacher and student could get the final results of the data from the client(3)Module training: the basic of deep learning was the continuous training of each module, and training in experiment was also through improved algorithms. Firstly, the network information was shared through the module of information sharing. The image data was compressed into tensor and then input to the student client, and the recognition result was obtained according to the network curve. Next, through LSTM processing images, protobuf was used to serialize the acquired ckpt file, so as to realize the fine classification of images. Finally, in order to improve the recognition ability of the network model to the target image, the bias algorithm was introduced

2.2. Framework of Long Short-Term Memory Neural Networks
Long short-term memory (LSTM) was a kind of neural network based on deep learning [19], because the design architecture of this network was very suitable for processing and predicting important events of time series interval and delay. LSTM image processing efficiency was better than ordinary models such as the hidden Markov model [20, 21], which was similar to the RNN structure. The main frame structure is shown in Figure 2.

LSTM could be divided into two parts: inside and outside. The internal contained input gate (under f-cell), forgetting gate (g/h) and output gate (above f-cell). These gates represented specific vectors, each vector element ranged from 0 to 1, approaching 0 indicated that this gate was closed and approaching 1 indicated that this gate was open. The input door determined the information content that needed to be entered at a certain time, and the specific equation was calculated as follows:
The forgetting door determined the information content of that needed to be forgotten at a certain time, and the specific equation was calculated as follows:
The output door determined the information content that needed to be output at a certain time, and the specific equation was calculated as follows: where represented the vector values’ size of different doors at different moments, respectively; represented the input vector values’ size at time and the output state at time ; , , and represented the parameters of different doors, respectively; and , , and represented the bias parameters of different doors, respectively. On the basis of the above expression, the cell state of each time node could be calculated, and the and the state of the hidden layer equations were as follows: where represented the variable of the cell state at time in the LSTM model, represented the cell state at time , and represented the state of the hidden layer at time .
The dataset used by LSTM is the MNIST dataset. The MNIST dataset includes images of handwritten digits and corresponding labels, including training data, test data, and validation data. During the network training process, at the beginning, some necessary dependencies and datasets are imported first, and some constants are declared. Next, and , placeholders, weights, and bias variables are set, and the network is defined to generate predictions. Then, after defining the loss function, optimizer, and accuracy, run the network.
2.3. Shared Network Algorithm Module and Biasing Algorithm Module
The shared network algorithm module was adopted in this experiment, with specific reference to Jiang et al.’s method [22]. The ResNet-t and ResNet-s of teacher network and student network both used residual shared network algorithm module to improve the depth of neural network. After introducing the residual module into the deep neural network, the gradient generated by the loss function would not be rapidly dispersed during the back propagation process, so the shallow part of the deep neural network could also adjust the parameters of this layer through the generated loss, and then the shallow part could be well trained. With the increase of network depth, the recognition ability of neural network would be stronger and stronger.
The biasing algorithm module was adopted in this experiment, specifically referring to Fu et al.’s method [23]. The biased CNN was introduced to improve the recognition accuracy of target LOGO, and it was applied to the teacher and student network. Furthermore, it could make the recognition accuracy of the student network close to the level of the teacher’s network on the bias class and make up for the disadvantage of reducing the recognition accuracy caused by the network compression on some target classes.
2.4. Laboratory Development Environment Setting
The development environment of this time is shown in Table 1. The implementation frame was Unity platform, and the AR SDK and Vuforia AR SDK in Unity 3D 2017.2.0f3 were used.
2.5. Comparative Study of Image Recognition Technology
For the purpose of displaying the difference between the LOGO image recognition technology based on deep learning and the network teaching mode built by other algorithms, the commonly used network image recognition algorithms were specially selected, which included Darknet network extraction YOLO (You Only Look Once) algorithm based on noncandidate frame [24], R-CNN (Region-CNN) algorithm based on CaffeNet [25] using Selective Search to obtain target candidate frame, Faster R-CNN algorithm based on VGG16 network [26], and ContextNet algorithm based on LSTM model, and they were named Q1, Q2, Q3, and Q4, respectively. (1)YOLO algorithm: it was a CNN operation, which included end-to-end prediction framework, and whose running speed was greatly improved by adding real-time module. The basic goal was to transform the detection problem into an image classification problem. The basic principle was to use windows of different sizes and proportions (aspect ratio) to slide on the whole picture in a certain step size and then classify the image corresponding to the areas of these windows, so that the entire picture could be detected(2)R-CNN algorithm: it was an image recognition algorithm that extracted the feature frame according to the image and used the neural network system to realize its operation. It first selected fixed regions on each image by searching and then constructed training and test samples based on these fixed regions. Nonmaximum suppression (NMS) was used for this score output during the test, which meant the process of removing the duplicate box. At the same time, a regression for each category was trained to achieve high-precision image recognition. However, there were many processes of R-CNN, including region selection, training of CNN, and training of support vector machines (SVM), which made the training time very long and took up a large proportion of space(3)Faster R-CNN algorithm: it was to add Faster algorithm to the original CNN network, Faster R-CNN-integrated feature extraction, feature frame extraction, bounding box regression, and image classification into a network, which greatly improved the overall performance, especially in detection speed(4)ContextNet algorithm: it was a target context detection network based on LSTM model. ResNet was used as the image feature extraction network, and the improved multiscale target candidate region extraction network was used to extract the target candidate frame. Furthermore, the LSTM model and target context information were used to classify the candidate frame. By using the context information of different scales as the input of the LSTM model, and using the output of the last layer of LSTM as the result of the classification of the entire candidate frame, the context information of multiple different scales was effectively combined to classify the candidate frame
Different pictures were adopted from two different datasets, among which the LOGO dataset contained more than 50 logos, each logo contained 2,000 pictures of different scenes. The other dataset was Pascal VOC, which was the international authoritative dataset in the field of computer vision. In total, it contained 15,000 different pictures, including multiple pictures with different semantic features. They were named A1 and A2, respectively. Datasets related to education were mainly studied in experiment, such as pictures of chemical equations, pictures of biological experiments, and pictures of physical space. The purpose was to investigate the appropriate image recognition algorithm, which could be applied to practical teaching activities, and improve the efficiency of teaching and students’ enthusiasm.
2.6. Algorithm Performance Evaluation Indexes
Generally, the evaluation of the performance of a target detection algorithm was mainly measured in terms of time complexity and detection accuracy [27]. Time complexity was evaluated by the detecting frequency in target detection, which was the number of images that the algorithm could detect per second. As to the accuracy, precision, recall, and mean average precision (mAP) were referred to. These indexes were important reference indicators widely used in the field of algorithm recognition to measure the recognition performance. Currently, it was widely used in the performance evaluation of various target detection algorithms. Precision referred to the proportion of the correct number of detected targets among all detected targets. Recall referred to the ratio of the number of retrieved related targets and the number of all related documents in the image database. The mAP represented the average value of average precision (AP) values for each category. The PR curve of each category was drawn by changing the detected intersection over union (IoU) threshold. According to the PR curve, the AP value of each category could be calculated, and then, the mAP value of the secondary target detection algorithm was obtained through calculating the average value of the AP values of all categories. The AP value was the area enclosed by the PR curve and the and axes. In target detection, the calculation equations of precision and recall were the same as the original calculation, but the statistical positive examples were determined by calculating the IoU of the detection frame and the standard frame. If the IoU was greater than a certain threshold, the detection frame was considered as a positive sample; otherwise, the candidate frame was considered as a negative sample.
2.7. Teaching Effect Evaluation of LOGO Image Recognition Based on Deep Learning
The method of evaluating the effect of developed system was based on the method of field investigation, and 8 classes of high school in a certain city were randomly selected as 240 students in different schools and different regions. Four science and technology classes (K1, K2, K3, and K4) and four regular classes (C1, C2, C3, and C4) were designed. The number of students in each class was 30. The difference between students was not very big and there was no significant difference. On the premise of not informing the students of the content, the two classes were taught independently, and the same teacher taught the students. Besides the different teaching resources, other external factors such as learning environment and teaching time were required to be the same, which effectively controlled the variables. For the students in the science and technology class, the LOGO image recognition technology based on deep learning was used in the teaching, while for the students in the regular class, the common teaching method was adopted. Excel 2019 software was used for data statistics, SPSS 20.0 was used for data significance analysis (multifactor analysis of variance was used), and Origin 9.1 and Visio 2013 software were used for drawing.
3. Results and Discussion
3.1. Performance Comparison of Different Image Recognition Algorithms
The results of the experimental example of the image recognition through the system are shown in Figure 3, and it could be clearly observed that the used ContextNet algorithm was superior to the YOLO algorithm, R-CNN algorithm, and Faster R-CNN algorithm in LOGO image detection. Especially in the case of small LOGO detection, for example, when detecting DELL and GUCCI, two relatively small logos, ContextNet algorithm detected the targets successfully, while the other three algorithms did not detect the targets. At the same time, in terms of the accuracy of the frame detection, the ContextNet algorithm was better than the other three algorithms. For the detection of Starbucks and Apple targets, the target frame obtained by the ContextNet algorithm was more accurate than the other three algorithms.

The performance comparison of recall rates for different image recognition algorithms could be seen in Figure 4; the YOLO network extraction algorithm had no deep learning or other special network structures, so compared with Faster R-CNN, R-CNN, and ContextNet algorithms, its recall rate was the worst, with an average recall rate of 35.56%. Since the RNN network and Faster R-CNN network had the neural network learning process, the accuracy of the two algorithms was greatly improved, and their average recall rates were 54.03% and 62.82%. Moreover, due to the target context detection network with LSTM model, the ContextNet algorithm was able to extract image feature frame well, with an average recall rate of 68.01%. Based on the above results, the brand LOGO image recognition algorithm based on the proposed deep learning had better recall performance.

The results of the comparison and analysis of the average detection accuracy of different image recognition technologies could be seen from Table 1 and Figure 5. The part-based target detection framework performed very well in the LOGO image dataset. However, although the detection speed of the YOLO algorithm was fast, the noncandidate frame-based Darknet algorithm performed poorly in detecting small targets. The detecting performance of R-CNN was a bit worse than Faster R-CNN, but the operation time and test time consumed by the R-CNN algorithm was much higher than all other methods, which was not practical for algorithms with relatively high requirements. In addition, the R-CNN algorithm used the traditional Selective Search method when obtaining candidate frames. The detecting quality of the R-CNN algorithm was also much worse than the Faster R-CNN algorithm. The main reason was that the R-CNN algorithm used the Selective Search method to extract candidate frames for the target. The Faster R-CNN algorithm used the candidate region extraction network to extract the candidate frame for the target. Selective Search was much less effective than the candidate frame extraction network in acquiring the LOGO image target candidate frame. However, the ContextNet algorithm not only surpassed the other three methods in detection quality but also greatly improved detecting efficiency. Therefore, the teaching method based on deep learning LOGO image recognition technology constructed was correct, reasonable, and scientific; it is shown in Table 2.

3.2. Teaching Effect Evaluation Results of LOGO Image Recognition Based on Deep Learning
After the end of the teaching experiment, the difference of teaching effect between the experimental group and the control group was analyzed by the classroom test results. The statistical results are shown in Figure 6, and the results showed that the four classes taught with LOGO image recognition technology had a higher average score of 75, 76, 73, and 77, respectively, while the average scores in the four classes of traditional teaching methods were 63, 61, 60, and 62, respectively. It showed that inquiry creative experiment teaching could improve students’ achievement significantly compared with traditional teaching, it can be in Figure 7.


The above data were used for independent sample test by SPSS, and the results are shown in Table 3. The standard deviations of the grades of the classes taught with LOGO image recognition technology were 12.56, 12.72, 12.06, and 14.37, respectively, while the standard deviations of the traditional teaching methods were 12.01, 13.50, 13.84, and 11.54, respectively.
As shown in Table 3, it was tested by the Levene method. When the variance was assumed to be equal, was 0.032, indicating that the value had a significant difference (), while assuming that the variance was not equal, was 2.014, was 78, and , also indicating that significant level has been reached. Through the test of the two independence, it showed that there was a significant difference between the experimental group and the control group in the ordinary class, which further proved that inquiry creative experimental teaching had certain superiority compared with the traditional teaching.
4. Discussion
Zhang et al. indicated that an image target feature extraction and recognition model based on a deep CNN was established by using two deep learning algorithms—mask R-CNN algorithm and fast R-CNN algorithm [28]. First of all, aiming at the problem of identifying multiscale targets in LOGO image detection, an improved multiscale LOGO candidate region extraction network is proposed. Different from the original candidate region extraction network, the multiscale target detection is realized by the feature pyramid method. As to the character that the size of the LOGO image is relatively small compared to the ordinary target, the k-means algorithm is applied to the multiscale clustering of the LOGO image target to obtain the distribution of the LOGO target scale. In this way, the priori parameters required by the multiscale candidate region extraction network to extract the candidate frame of the LOGO image target are obtained. The multiscale candidate region extraction network uses the scale features existing between the layers of the CNN and the composition of the feature pyramid to achieve the purpose of extracting the image feature pyramid using the neural network, which is consistent with the findings of Huang et al. [29].
Second, in view of the problem that the target of the LOGO image detection is relatively small and difficult to identify, a target context classification network based on the LSTM model is proposed. The final classification result is obtained by using the target context features of different scales as the input of the LSTM model. This connection method makes full use of the different effects of target context information of different scales on target classification. Furthermore, in order to improve the accuracy of the frame, a suitable method for small target frame regression is used, and the detection accuracy is once again improved. Compared with the traditional target detection methods, the method based on target context features can improve the performance of target detection in image dataset, which is consistent with the findings of Bai et al. [30].
Moreover, the LOGO recognition algorithm is applied to the actual teaching design and added a shared network algorithm module. Compared with the traditional knowledge algorithm, it can well share some parameter layers of the teacher-student network and transform the traditional teacher and student network independent training into simultaneous training. This optimization method has brought about an increase in the accuracy of the student’s network and has improved the ability of the student’s network model to identify and express. This is consistent with the results of Nie et al. [31]. After the network parameters are shared between teachers and students, it is beneficial to update the bias of the subsequent biased network frequently, and the parameters of the sharing layer are not required to be trained from the beginning, so they can be directly invested in the basic feature extraction of the new biased network. Meanwhile, it also reduces the traditional training time, which greatly saves the cost of training time and is conducive to the continuous iteration and update of the model.
In addition, in view of the unique teaching scene of brand LOGO, the algorithm needs to adopt different strategies according to different classes. In different teaching periods, there are different scene requirements for different target LOGOS. Hence, a biased neural network is designed. By setting the biased vector and the loss function of the student network, without changing the original network size, the recognition accuracy on the target LOGO can be customized and improved, and the recognition accuracy difference between the biased neural network and the CNN with larger network can be limited in the range of 1% to 5%, and its network expression ability has improved significantly. A brand LOGO recognition creative teaching system based on deep learning neural network is designed and implemented, which incorporates the advanced deep learning method to extract image extraction features. Compared with the previous traditional ML algorithms, it realizes end-to-end automatic feature extraction, greatly reduces the consumption of storage space, and has a higher recognition and expression ability, which is consistent with the research results of Zhang et al. [32].
5. Conclusion
Through collecting a large number of literatures on LOGO image recognition technology, the feasibility of the application of LOGO image recognition technology in teaching is analyzed. By combing the process, development process, and project realization of this technology in teaching application, the application value of this technology is explored by focusing on the individual cases of high school teaching. Finally, through the application and the actual teaching process, through the form of actual investigation, the application effect is studied. The application of LOGO image recognition technology in teaching can significantly improve students’ academic performance and effectively drive their interest in learning. Hence, the teaching method of LOGO image recognition technology based on deep learning is correct, reasonable, and scientific. Compared with YOLO, R-CNN, and Faster R-CNN, LOGO image recognition algorithm method based on deep learning theory has stronger image recognition ability and more accurate results. Primary school English teaching method based on LOGO image recognition can significantly improve students’ learning interest and classroom participation, and inquiry creative experimental teaching based on LOGO image recognition has certain superiority compared with traditional teaching. The brand LOGO image recognition and inquiry creative teaching methods of deep learning theory proposed in experiment also have some defects: because the LOGO area occupies a small area in the image, the resolution of the input image is only when designing the neural network. The resolution of the input layer directly affects the number of layers that the CNN can contain, thus affecting the training and prediction effect of the network. Hence, in the future work, in order to make up for the simple structure of CNN caused by low resolution of LOGO samples, more samples with higher resolution need to be collected, so as to design CNN that can better extract target features and distinguish different targets.
Data Availability
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent
Informed consent was obtained from all individual participants included in the study.
Conflicts of Interest
All authors declare that they have no conflict of interest.
Authors’ Contributions
All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.