Abstract
At present, artificial intelligence algorithms based on deep learning have achieved good results in image classification, biometric recognition, medical diagnosis, and other fields. However, in practice, many times researchers are unable to obtain a large number of samples due to many limitations or high sampling costs. Therefore, image sorting zero-sampling order research algorithms have become the central engine of intelligent processing and a hot spot for current research. Because of the need for the development of deep learning prediction capability, coupled with the emergence of time and technical-level drawbacks, the advantages of zero-sample and small-sample are gradually emerging, so this paper chooses to fuse the learning methods of both for image recognition research. This paper mainly introduces the current situation of zero-sample and small-sample learning and summarizes the learning of zero-sample and small-sample. And the meaning of zero-sample learning and small-sample learning and the classification of the main learning methods are introduced and compared and outlined, respectively. Finally, the methods of zero-sample and small-sample learning are fused, the design is introduced and analyzed, and the future research directions are prospected according to the current research problems.
1. Introduction
With the rapid development of big data and computer devices, deep learning has entered a phase of rapid development. Although great progress has been made in recent years in terms of in-depth research, especially in image recognition, and in the field of computer vision speech recognition, deep learning based on deep learning models and the powerful potential of large-scale learning data sets have made great progress and shown vigor, and the accuracy of image classification of large data sets has been improving, but it relies heavily on the collected labeled data. Coupled with the exponential growth in time and manpower required as the number of image data and classifications grows rapidly, this has severely hampered the development of the ability to study predictions in depth. In this context, zero-sample learning models and small-sample learning have attracted increasing attention from scholars [1].
Most of the existing methods for image classification are part of the teaching of control methods. Such models require a large amount of explanatory data. In addition, some training units for certain class facilities are difficult to obtain. In the case of endangered species, it is very valuable that image data are difficult to obtain due to extinction. Considering the importance of image data, identifying and promoting wild endangered species that do not rely on large-scale training samples would have significant commercial and environmental value. In addition, as the number of object types increases, new data from detection systems need to be added in the real world to restart training. Zero-sample learning methods and small-sample sampling methods are among the most important directions if target classification techniques are to be more widely developed. On the one hand, machine learning is improving, and HRI growth data are becoming less and less realistic. On the other hand, computer technology has made significant progress in recent years, with unprecedented breakthroughs in the field of learning appropriate for development in areas such as migration, adaptation to local conditions, and also in the field of image recognition. Such based on two development scenarios, zero-sample and small-sample learning aims to identify certain classes of unspecified targets from labeled raw data. Zero-trial learning to study small samples is very important and represents an important new stage in the development of artificial intelligence. As humans learn, we define categories we have never seen before by understanding the images of the categories and the semantics of the relationships between visible and invisible classes. Zero-sample learning and small-sample learning are identical in their basic ideas. The labeling of visible and invisible classes allows to divide the semantic space between the same classes and to achieve the recognition of invisible classes. Both zero-sample and small-sample have improved considerably, but are still in a rapid development stage. Zero-sample and small-sample detection have a wide range of applications not only in the field of image recognition but also in computer vision perception-related fields and even in natural language processing. Such extended generalized zero-sample and small-sample classification methods are also called generalized sample learning, while the earliest methods for sample learning in the direction of image classification are called narrow-sample learning. This paper focuses on narrow-sample learning, which represents a sample learning method for image recognition classification. This phenomenon is known as over-simulation because if the depth model is very complex, it is easy to consider the noise in a small-sample training data set as a sign of the whole sample and therefore the training model on the test data set performs poorly [2].
In summary, because of the need for the development of deep learning predictive capabilities, coupled with the emergence of time and technical-level drawbacks, the advantages of zero and small samples are gradually emerging, so this paper chooses to fuse the two learning approaches for image recognition research.
2. Research Background
2.1. Review of Zero-Sample Learning Research
In the early development stage of zero-sample learning, the classification of target categories is usually achieved using a two-stage approach. In order to infer the category of the input attributes, the attributes of the image are first entered and then searched among a set of category attributes [3]. For example, the direct attribute prediction proposed by Lampert et al. is a great improvement to category inference. The main approach to direct attribute prediction is to learn a classifier on attributes, which achieves prediction of classes by computing the posterior probabilities of attributes of input samples. Again, most attribute-based methods require a complete description of the properties of each invisible class. Making this connection takes a long time and usually requires domain-specific expertise [4]. Norouzi et al. proposed a method for learning coupling relationships between classes and their corresponding attributes. Automatic prediction of class properties is performed using a learned relationship model given only invisible class names [5]. Matthew et al. proposed a simple method to create an embedded image system that consists of any existing N directional classification mechanism and any existing semantics. Its basic idea is to label categories in the semantic space. In recent years, many zero-sample teaching models have emerged, basically it allows to classify images in the semantic space [6]. The concept of semantic output code classifier was defined by Palaucci et al. The semantic output code classifier uses the knowledge base of semantic properties of invisible classes to create new classes, giving a view of the classifier, validating its theoretical properties, and providing it with the conditions for accurate prediction of new categories. It combines multitask learning and multidisciplinary learning, interpreting classical and modern multitask and multidisciplinary learning algorithms as different structured semantic description methods, and proposes a semantic deep visual embedding model that uses labeled image data and semantic information instead of text to identify visual objects [7]. Alexande et al. show a recent performance of the model in a 1000-level image web object recognition challenge, and he also shows that the model can use semantic information to predict images not seen in tens of seconds of training, using state-of-the-art image features, where we monitor different control attributes and embedded outputs without involvement, as well as hierarchical structures and untagged text storage. The results show that pure unsupervised output embedding has emerged convincing results even beyond previous control on state-of-the-art techniques [8]. A two-layer network based on the relationship between features and categories was developed by Romera et al. Converting these methods to a regional adaptive approach shifts the limits of error generalization and further considers an important factor to re-examine the problem of text representation. There are also some studies of zero-sample learning through nonlinear cross-modal embedding, which uses a class-compatible model that extends a bilinear network model with latent variables. Instead of studying individual maps, it uses two independent recognition models to identify and classify invisible classes using differences in the semantic space [9]. Zhang et al. argue that if any source or target is considered as a mixture of visible proportions, assuming that both instances belong to the same invisible category, then the mixture models should be similar. Specifically, the functional source or target data fields of the embedded source and target that are available to any user will be learned to be distributed in the same semantic space. Then measure the similarity in the semantic space. Establishing a learning framework based on the maximum limit of similarity function, the problem of zero-sample study from the perspective of multiangle learning is proposed by station manager Ping-Yu et al. The core idea is the alignment of semantic space and visual feature space. It introduces two rooms. In both cases, there is a set of “phantom categories” that serve as the basis for query dictionaries. These “mirage” categories can be optimized by using data from a single source field to obtain better results [10].
2.2. Review of Small-Sample Learning Studies
Wang et al. argue that small-sample learning algorithms are the key to the intelligent transformation of traditional industries, and they have a wide range of technical and theoretical implications. At present, microsample research algorithms have been widely used in image recognition, spectroscopy, industrial production, text classification, radar detection, agricultural disease detection, and physical examination. From the perspective of error segmentation, a small sample of key learning problems in machine learning theory is conducted, and a small sample is summarized on the basis of theoretical analysis to study the motivation of designing algorithms, and existing algorithms are classified into learning strategies, data extensions, and learning strategies according to the design motivation. A comparative analysis of the basic ideas and representative algorithms of various approaches is also presented. The advantages and disadvantages of the various methods and the future directions of development are reviewed in the combination of experimental results. Based on the analysis and prospective analysis of existing research methods, the research idea proposed in this paper is presented [11]. Lazebnik et al., based on the word packet model, calculate the attributes of each block using multiscale blocks. Finally, by connecting all features and using hierarchical pyramids, it is possible to correct the error in the precise positioning of points in the packet matrix and get good results in small-sample image classification based on the packet model. The dictionary model-based ACO model must make full use of expert knowledge to represent complex image structures, similar to the deep sea model approach based on large-scale data images. The dictionary model-based scheme is still not universal, and as the deep study is becoming more accurate in a wider range of image classification, many researchers have tried to apply the deep study approach to classify small image samples. Many sources point out that small samples can be used to classify images using a wide range of data sets, such as ImageNet, which has been trained in convolutional neural networks [12]. Salakhutdinov et al. introduced the HDPDM model, which better supports vector algorithms in databases such as CIFAR, handwritten fonts, and human motion detection. Fan Hu et al. introduced a preprepared deep learning model that also generalizes well to the classification of small data and high-resolution remote sensing images after refining the parameters [13]. Hu et al. proposed the same classification method as Fan and performed experimental validation with better results in most cases, but also some poor results. Many researchers in China have tried to apply deep analysis to small-sample data in order to classify images. These methods are based on the prepreparation of deep learning and then transformed into small-sample data sets, which have achieved good results in related fields. No training depth model is prepared using small data samples and then the pretrained methods are classified using an auxiliary vector machine. Although the classification results are better than those of the “word packet” based model, the classification accuracy is still significantly lower than when the model is prefabricated. Therefore, the most common method for classifying images into small samples is to prefabricate the deep learning model using large-scale data and then convert it into small samples [14].
3. Research Methods and Materials
3.1. Zero-Sample Learning
3.1.1. Concept
Zero samples obtained through knowledge transfer are transferred from the material class to the immaterial class to establish appropriate relationships between the material and immaterial classes and classify the immaterial classes. Zero sampling enables knowledge transfer between visible and invisible classes, and knowledge transfer relies on spatial embedding. During the training process, using the data and semantic space of the visible classes with labels, the model can fully understand the visible features and semantic space of the classes, as well as the correlations between the relevant class attributes. After studying these relationships, the classification of invisible classes can be obtained by extracting the visual features of invisible classes and then by searching the corresponding attribute combinations in the semantic space [15]. The specific operation process is shown in Figure 1.

3.1.2. Algorithm Flow
A set of training modules is given, in which the first graphical image, class labels, and training package are arranged. The task of zero-sample training is to identify the samples and classify them into new classes, that is, to classify the samples into new classes. Other matters identify the samples and classify them. As for the other matters, the test categories differ from the textbook categories. In addition, the empty sample analysis contains additional textual information of all categories as feature descriptions. Using the generic vector of class descriptions, we can learn zero samples and transfer the knowledge from the learned sample classes to the new test classes. As zero samples in the training process of knowledge transfer bridge, class description vectors usually consist of attribute vectors such as shape, color, size, and material for training sets and tests. Some studies have used text interpretation regions as vectors of function descriptions [16].
3.1.3. Method Classification
A series of methods have been proposed to solve the problem of zero-sample learning, which can be classified into quantitative learning methods, similar learning methods, multistructure methods, and model methods [17], as shown in Figure 2.(1)Quantitative learning methods: quantitative teaching methods aim to determine the space, i.e., the minimum distance between other matter image features and the corresponding semantic vector. The simplest method is to use the semantic vector space as the metric space with features distributed in the semantic vector space and its nearest neighbor classification. The Euclidean or cosine distance is used directly as a measurement function. Several studies have shown that using the image feature space as a metric space can effectively reduce the field and axis complexity problems inherent to the zero energy level. Based on this, a spatial depth model based on image features is developed. The original image is encoded by CNN into the eigenspace, and the semantic vectors are mapped to the same eigenspace by multilayer sensors. The method classifies the nearest neighboring multilayer sensors based on the specificity of the image and the semantic features of the space itself and considers the spatial search as an implicit method of spatial measurement. These methods change the functions of the metric sphere and the spatial distribution between the metric spheres. It includes linear and nonlinear transformations, such as support vector returns. The most important issue in measurement is the design of the objective (loss function), which is related to the overall characteristics of the model. The measurement-based approach allows for the classification of features in space using the latest neighborhood rules. The model is clear and easy to understand, but its performance varies considerably due to the choice of measurement chamber [18].(2)Similarity learning methods: in contrast to the study of spatial mapping functions, compatibility learning methods are directly related to the study of image space and semantic space vectors. The basic approach is a bilinear form that converts spatial and semantic vectors into similar scalar vectors. This belongs to a deep visual semantic design model with structured joint embedding. Taking advantage of the nonlinear function transformation, the proposed method is simple and computationally small, but requires more training data.(3)Structural diversity approach: some studies have explored the diversity of semantic and graphical spaces from the perspective of diversity studies and tried to explore the structure of diversity to a new test category. The model learns the different structures of each vector in the semantic space and transforms it into a model space for multielement learning concepts. Zero-sampling methods based on multiple structures can consider the relationship between categories; however, the different structures in different feature spaces are different and difficult to move.(4)Model-building-based approach: some recent studies are based on the idea of samples that can be used to generate new feature types or even original 2D images by constructing sets of network teaching materials. The classification problem is solved by shifting from on-the-job training to supervised training. This approach usually consists of two parts: sampling training and classification training. In the sample generation phase, new image types are generated based on the training samples and the corresponding text descriptions when describing the new category vectors. In classifier training, the classifier is trained for sample generation and test sample online recognition. The model-based approach is incremental and easy to apply, but it has some problems such as low sample presentation rate and difficulty in production.

3.1.4. Properties
(1)Portability. Attributes can only describe one aspect of a given situation and usually attributes cross between classes. It is due to the difference or similarity of attributes between classes that the description of each attribute of a class is characterized and creates a bridge between visible and invisible classes. The zero-sample learning approach learns the visual features of the visible category and the corresponding properties of the various visual attributes by learning the visual features of the visible category. Then from the visual features of the invisible category, the corresponding set of attributes is found and the corresponding classes are found in the semantic space. This transposition of semantic attributes ends with knowledge transfer between the visible and invisible classes.(2)Flexibility. Semantic properties can be configured very flexibly. By eliminating specific situations and different detection tasks, the most appropriate type of attribute can be more efficiently determined to distinguish between tangible and invisible categories.(3)Interpretability. The interpretability of semantic attributes is a unique and superior condition for their existence. Semantic attributes are usually matched with certain visual features. By studying the semantic attributes, we can understand the differences between categories accurately.
3.2. Small-Sample Learning
3.2.1. Concept
Similar to zero-sample training, small samples solve the problem of manually labeling invisible images, thus avoiding consuming too much time and effort. Unlike zero-sample learning, small samples do not use the common semantic knowledge between the source and target domains, but for the test categories, each category is given one or more labeled instances, and the test categories can be identified by using very small-sample learning [19].
3.2.2. Algorithm Flow
In this section, a small sample is studied as an active subject. It is motivated to classify it into an image representation phase, a data expansion phase, and a learning phase, and the schematic diagram of the small-sample learning algorithm is shown in Figure 3 [20].

The motivation of representation learning is to transform the raw data into feature domains through representation learning. The feature domain has very little measurement and semantic information, which greatly reduces the learning difficulties, and the simplest idea is to extract one or more features from a large number of basic categories so that they can be adapted to the limited differences between the basic categories and the new categories and then to recognize them by category. Although this initial approach to adjustment is intuitive and simple, it is difficult to study general features in small instances and achieve good results. In recent years, representative learning has made great strides with the development of automatic tuning techniques, facilitating further development of small-scale studies. The goal of learning self-monitoring is to reliably represent one’s own learning data. There are no signs of classes, and only their own information and data structures are used. The main problem is to construct complex tasks, extracting instances from large amounts of unmaintained data, and then completing downstream tasks based on a more compact semantic view. In a small example, the representation of levels is very important, as constructing a reasonable concept of self-controlled learning is a hot topic of research.
Based on data extension for small sampling algorithms, data extension is motivated by the desire to generate as many anomalous samples as possible to increase the sample size, reduce the upper limit of cumulative error, and improve the reliability of empirical risk. Data expansion can be divided into two parts: source domain expansion and semantic space expansion. Expanding the semantic space means creating samples based on tables and feature vectors.
It uses knowledge transfer experience to reduce the learning cost of the model and achieve the purpose of neural network adaptation to small-sample data. Their concept suggests some similarity between similar tasks. The essence is to classify new classes of data based on the common features, connections, and parameters shared by the models. If sufficiently similar tasks exist, an off-the-shelf model can be used as a good initialization. In big data preprocessing, a fast response to the task at hand can be made by fixing feature extraction, differentiation, and other methods. If the preparation is not sufficient to complete the task, additional parameters can be input and the learning process can be initiated. The specific migration methods can be divided into feature migration, model migration, and dependency migration, as shown in Figure 4.

4. Results and Discussion
4.1. Fusion of Zero-Sample Learning and Small-Sample Learning
The current research on image classification methods is mainly focused on zero-sample and small-sample, while the research on zero-sample and small-sample is relatively weak. Zero-sample learning and small-sample integrated learning not only contains multiple support instances but also textual information, which has the unique advantage of learning different patterns. Compared with zero and little learning, from the perspective of human perception, human perception of a new category or concept is summarized by a small number of samples and also combined with different cognitive modes for overall understanding, which is the result of the interaction of different cognitive modes. Combining zero-sample and microsample instruction with practical needs and scientific research, this would be a way to study image classification and people’s perception. In addition, zero-sample learning can be combined with active learning to improve the effectiveness of active learning. Zero-sample learning can be integrated into lifelong learning systems which only provide relevant information, and continuous learning and enhancement of new missions is rapidly developing. Coupled with poor control, an improved training system will be better able to cope with new challenges, new developments, and even new fields.
4.1.1. Design Theory
Zero-sample studies are the result of combining small-sample studies with zero-sample studies. A small sample is based on a general description of a category. To define new categories or concepts, if the sample supports a small number of samples, first clarify the explicitly stated problem and introduce the experimental design and the results of some basic tests.
To further identify new categories that do not appear during the learning process, a small instance-based category description vector is added to explore the problem of learning based on textual information. In other words, a combined learning sample of zero and small learning samples is given in this paper. For the first time, the combination of zero and small samples is an effective way to improve machine intelligence. Since then, the integration of zero and small samples for teaching gradually involves preparation. A multiobjective network has been designed to investigate small samples supported by semantic information using semantic descriptions of image features and local features. The model-based idea envisages the creation of a bitriangular network based on the semantic information category to create new sample features with a multimodal cross-fitting loss function in the encoder model. To obtain new samples in the new implicit feature space, a cross-modular multimodal and distributed monolithic encoder is proposed where zero-to-data expansion is performed under small-sample conditions. Although it has been around for a long time, there are still a few learning instances that have not been fully investigated. In some specific domains, limited labeled data make prior learning difficult.
There are two approaches: image streams and text streams. The image stream is displayed at the top and the text stream at the bottom. The model is an adaptive exploitation mechanism for multimodal information, allowing the adaptation of weight factors in text and image attributes. Different information modalities are utilized to improve the learning of small samples.
The data set and test equipment are still in the initial stages of small-sample formation. Zero- and small-sample integration requires some new examples to support the categories and additional semantic features to describe the vectors as additional information. In the training phase, the instances, categories, and descriptions of known training images are used to describe the parameters of the vector training model. In the testing phase, a large number of test samples are categorized and identified, a series of auxiliary samples and their new categories are used to describe the vector information, and the correct test coefficients are calculated as the final metrics for model evaluation. Currently, the conventional experimental set-up is classified as a subspace.
4.1.2. Experimental Validation
Currently, most of the studies related to image classification research are conducted with uniform data sets, and the training set is continuously analyzed even when testing is performed. For example, most zero-sample studies use a fixed 40-level training and pooled data to determine a 10-level test with relatively reliable experimental data. In this data set, the validity of the model against other data, i.e., the validity of the model, should be studied. Other matters we can use traditional training methods for large data sets, such as 5x cross-validation, zero-sample and small-sample training integrated experimental validation, and comprehensive testing of the model under various data conditions.
At the same time, the experimental set-up should be more suitable for practical applications, such as the current small and zero samples, which mostly only test the classification performance of new categories that have not been discovered so far. In practice, however, the tests are usually conducted according to the new curriculum, which focuses on vocational training. Improving the efficiency of such broad classifications is also an important area of research.
On the other hand, although the amount of training data required for zero- and small-sample training is very small, the pretraining of the model directly affects the final performance of the model. How to minimize the model complexity is also a problem that needs to be investigated. The current research mainly consists of heuristic studies and experimental certifications and lacks sufficient theoretical foundation. Theoretical analysis is needed on how to select more zero-sample information from the training set to the unknown sample, which data and knowledge are more effective, and how to suppress external information in training to avoid negative transfer. Scientific theoretical analysis and sufficient experimental data will help to include zero and small samples in the investigation.
4.1.3. Formulation
According to the expression of the computational function used to fuse zero-sample learning and small-sample machine learning methods and their description in the claim satisfaction design theory text, the expression of the feature centers of class images is
The above formula indicates the feature centroids, the number of samples, and the image feature class.
The above equation denotes the feature representation vector and denotes the mapping feature vector.
4.2. Analysis of Results
In this section, the application of fused zero-sample learning and small-sample machine learning methods in image classification learning is analyzed using the comparative analysis method. The three modes described below are zero-sample learning, small-sample learning, and fused zero-small-sample learning.
4.2.1. Comparative Analysis of Image Classification Speed under the Three Modes
When the number of samples is 100, the speed of image classification processing with zero-sample learning is 4.3 minutes, the speed of image classification processing with small-sample learning is 2.4 minutes, and the speed of data processing with fusion method is 2 minutes; when the number of samples is 150, the speed of image classification processing with zero-sample learning is 4.5 minutes, 3.4 minutes with small-sample learning, and 2.1 minutes with fusion; when the number of samples is 200, the speed of image classification processing with zero-sample learning is 4.7 minutes, 4.4 minutes with small-sample learning, and 3 minutes with fusion. When the number of samples is 200, the speed of image classification processing with zero-sample learning is 4.9 minutes, the speed of image classification processing with small-sample learning is 4.4 minutes, and the speed of data processing with the fusion method is 3.37 minutes. From the data analysis, it can be seen that the three modes of image classification speed from fast to slow are: fused zero small-sample learning, small-sample learning, and zero-sample learning, thus it can be proved that fused zero small-sample learning has an advantage in image classification processing speed, so the study of fused zero small-sample learning mode is very useful and has a high value as the advantages are very obvious as shown in Figure 5.

4.2.2. Comparative Analysis of Image Classification Processing Accuracy under the Three Modes
When the number of samples is 100, the accuracy of image classification processing with zero-sample learning is 79.5%, the accuracy of image classification processing with small-sample learning is 86.4%, and the accuracy of data processing with fusion method is 93.3%; when the number of samples is 150, the accuracy of image classification processing with zero-sample learning is 78.25%, the accuracy of image classification processing with small-sample learning is 85.35%, and the accuracy of data processing with fusion method is 92.45%; when the number of samples is 200, the accuracy of image classification processing with zero-sample learning is 77%, the accuracy of image classification processing with small-sample learning is 84.3%, and the accuracy of data processing by fusion method is 91.6%; when the number of samples is 250, the accuracy of image classification processing by zero-sample learning is 75.75%, the accuracy of image classification processing by small-sample learning is 83.25%, and the accuracy of data processing by fusion method is 90.75%. From the data analysis, it can be seen that the accuracy of the three modes of image classification processing is from high to low: fused zero small-sample learning, small-sample learning, and zero-sample learning, thus it can be proved that fused zero small-sample learning has an advantage in image classification processing accuracy, and the accuracy rate is high, and the data processing ability is strong, so it has a high value for research, as shown in Figure 6.

4.2.3. Comparative Analysis of the Usage Rate of Three Modes of Image Processing
20.68% are zero-sample learning; 30.98% are small-sample learning; and 48.34% are fused zero-small-sample learning. This shows that fused zero-small-sample learning has a high usage rate in image classification processing and a strong data processing capability, as shown in Figure 7.

4.2.4. Comparative Analysis of the Recognition of Image Processing in the Three Modes
Assuming that the recognition score is 100, the zero-sample learning score is 80.25; the small-sample learning score is 85.6; the fused zero-sample learning score is 90.95. From the data analysis, it can be seen that the higher score of fused zero-sample learning represents the highest recognition of its image processing, so the fused zero-sample learning method in image sample processing has more obvious advantages, as shown in Figure 8.

5. Conclusion
With the rapid development of big data and computer equipment, deep learning has entered a stage of rapid development. In recent years, in the field of computer vision speech recognition, especially in the field of image classification, deep learning based on deep learning models and the powerful potential of large-scale learning data sets have made great progress and shown vigor, and the accuracy of image classification of large data sets has been improving. The task of image classification in computer vision is to classify images of interest by mechanical learning and other methods, and other matters image classification techniques are gradually being introduced into people’s lives and work, greatly facilitating people’s lives and changing their lifestyles. Although great progress has been made in recent years in terms of in-depth research, especially in image recognition, it relies heavily on the collected labeled data. As the number of image data and classifications grow rapidly, the time and manpower required grow exponentially, severely hindering the development of deep learning predictive capabilities. In this context, zero-sample learning and small-sample learning have attracted increasing attention from scholars. Because of the need for the development of deep learning prediction capability, coupled with the emergence of time and technical-level drawbacks, the advantages of zero-sample and small-sample are gradually emerging, so this paper chooses to fuse the learning methods of both for image recognition research. In this paper, we describe the current status of research on zero-sample learning and small-sample learning, summarize and outline the research on zero-sample learning and small-sample learning, and introduce the meaning of zero-sample learning and small-sample learning and the classification of the main learning methods and compare them, respectively. Finally, the integration of zero-sample and small-sample learning methods is presented and analyzed in terms of design, and future research directions are envisioned based on the current research problems as follows:(1)Consider how to improve image rendering through self-control without supervision. Based on learning, the mapping space can be optimized by using simpler assumptions. Also, the number of samples required is lower due to the lower difficulty of spatial reception, which facilitates sample propagation.(2)Given the higher level of reliability and control, the development of more general, efficient, and appropriately scaled data expansion algorithms should be considered.(3)Consider more flexible and effective use of meta-learning to integrate zero-sample learning and small-sample learning.(4)Consider how to design an effective algorithm for integrating zero-sample learning and small-sample learning.
Data Availability
The dataset is available upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest.