The Performance Research of the Data Augmentation Method for Image Classification

Zhang, Ruirui; Zhou, Bolin; Lu, Chang; Ma, Manzeng

doi:https://doi.org/10.1155/2022/2964829

Mathematical Problems in Engineering

On this page

Abstract Introduction Related Work Conclusion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Special Issue

Advanced Aspects of Computational Intelligence and Applications of Fuzzy Logic and Soft Computing

View this Special Issue

Research Article | Open Access

Volume 2022 | Article ID 2964829 | https://doi.org/10.1155/2022/2964829

The Performance Research of the Data Augmentation Method for Image Classification

Ruirui Zhang,¹Bolin Zhou,¹Chang Lu,²and Manzeng Ma¹

Academic Editor: Naeem Jan

Received01 Mar 2022

Revised29 Mar 2022

Accepted02 Apr 2022

Published18 May 2022

Abstract

To collect full-labeled data is a challenge problem for learning classifiers. Nowadays, the general tendency of developing a model is becoming larger to be able to obtain more potential capacity to effectively predict unknown instances. However, imbalanced datasets still are not able to meet the needs for training a robustness classifier. A convincing guidance to extract invariance features from images is training in augmented input datasets. However, selecting a proper way to generate synthetic samples from a larger quality of feasible augmentation methods is still a big challenge. In the paper, we use three types of datasets and investigate the merits and demerits of five image transformation methods—color manipulate methods (color and contrast) and traditional affine transformation (shift, rotation, and flip). We found a common experiment result that plausible color transformation methods perform worse against traditional affine transformations in solving the overfitting problem and improve the classification accuracy.

1. Introduction

Crowdsourcing is the most viable solution to collect a large number of labeled samples with high quality, but one thing is certain that crowdsourcing is a very time-consuming process and expensive to hire skilled workforce to recognize the millions of data samples. In some specific industry areas, e.g., healthcare, the raw medical data are sensitive and involved in significant privacy implications, for protecting patients against the breach of medical records; these resources are normally veiled by strict secrecy. Similarly, several data samples need to be gathered and labeled by professional knowledge background (e.g., aerospace industry); those factors increase the cost of collecting quality labeled instances. The lacking of labeled samples is a serious issue in deep learning research field, which has the common knowledge that the more training data you assess, the higher the generalization quality the neural network model can learn.

Another approach is to extract more possible feature values from limited data in order to improve accuracy and resilience. A little amount of fictitious data is created using mixed characters or stored as logical values in real-world situations. By introducing geometric [1, 2] or colorimetric distortion [1, 3] to a data space in the computer vision domain, virtual pictures that replicate extrinsic characteristics may be generated. Because of the picture perturbation effect, the learning process must adjust the criteria to account for the noise in the input samples and extract a common object structure to characterize them.

Learning a classifier from synthetic data remains a difficult problem, even with the assistance of data augmentation. When a neural network is trained, the background process may change hundreds of weight parameters at once, and there is no credible theory to determine how the individual weight parameters change and how much data is adequate to represent a reliable model. Nonetheless, a good network model should be adaptable to morphological transformations (e.g., flip, shift, rotation, etc.) and less susceptible to external disruptions such as ambient light or unique photo equipment. [4] utilizing the synthetic data propose a sparse autoencoder algorithm—Multichannel Autoencoder—to bridge the synthetic gap between generated images and real data, which imitate true-labels in terms of object structure and appearances; other analogous algorithms like GAN [5] and DBSMOTE [6] made the synthetic instance more close to the real one and improved precision. By contrast, improper image generation methods may break the density distribution in data space, sparse far from the centroid, and weaken the model’s ability to find invariable object structures. To address these problems, [7] explore a transformation selection system—transformation pursuit; this algorithm is based on greedy strategy to randomly search a large area’s parameters of candidate transformation and find the optimized super-parameters for enlarging particular data samples. Others like [8] do the same thing to select some values that could maximize entropy loss value between two training epochs. All these methods face the same problem—algorithms must to search enormous random parameters for different transformations, which wastes a lot of time without prior knowledge.

In this article, we firstly propose here to compare five augmentation methods, for ease of comparison of their merits and defects; we separated them into two groups (traditional affine transformation and color transformation) according to the human visual effect when natural images apply those random perturbations. Then, we put forward our prior technique in those augmentation methods that affine transformation is much capable than color augmentation to reduce the overfitting phenomenon and improve network accuracy. Secondly, to prove our point and facilitate the study on varied real-world images, we collect a Pests dataset, which has a clear background and easily identifiable object shapes. Along with our datasets and other two standard image processing datasets (VOC, ImageNet), we get the same result in training three different deep learning architectures (AlexNet [9], VGG16 [10], and InceptionV3 [11]), which shows that the precision significantly improved from 74.5% to 85.2% with appropriate transformation on the Pests dataset, and strongly reduced the overfitting on ImageNet and VOC datasets.

The difficulty with developing a model is that the neural network architecture performs far better in training datasets than in test datasets. In training datasets, the underlying model structure or an insufficient training parameter might be the cause. In the field of deep learning, a simple technique to overcome the overfitting issue is to include a regularization factor in the loss function, which penalizes peaky weight parameters and sparse weight parameter distributions. The sparsity of weights aids the model in determining which input variables are relevant. Typically, at the start of training, the weight values are assigned using a Gaussian distribution, and a particular weight-normalization layer [12] is used to favor diffuse weight vectors. Another widely used technique is dropout [13], which involves manually “deleting” some random connection nodes, and the descriptor could train with different network architectures in each mini-batch, which reduces the training time and improves the generalization capacity by providing loose coupling in network nodes. Furthermore, transfer learning supports a novel concept of applying pre-training weight and fine-tuning a subset of model parameters to meet particular characteristics. The benefit of transfer learning is that the fine-tuning model may be changed very accurately with fewer specific input samples. Data augmentation is useful in decreasing overfitting in dataset preparation because fresh synthetic examples may be created indefinitely, each sample after a slight perturbation, and the classifier would have to reject them in order to acquire additional latent features.

Previous research has focused mostly on the mechanism and sequence of picture production. For example, [14] classified augmentation methods as off-line or one-line deformation, with fresh enhanced images sampled and discarded after each iteration until the network converged. Some standard computer vision dataset (e.g., MNIST) has been widely used to judge the model’s ability and explore the differences between human recognition systems and computer vision algorithms. Commonly, plausible affine transformation (such as scaling and rotations) was applied in [2]; these types of diverse affine transformation parameters were stated to simulate uncontrolled oscillation on hand muscle. Different classical descriptors and a series of experiment performance are evaluated in [15], and forced strong and weak points using SMOTE [16], ELASTIC and DBSMOTE [6] transformation algorithms. In a large visual database–ImageNet, [10, 11] randomly crop from rescaled input images and generate sample patch from 256 × 256 to 224 × 224 to combat overfitting; some approaches apply random color manipulation [3, 9–11, 17] in the synthetic augmentation of images in terms of driving a model less sensitive to illuminate state changes, which is usually caused by real-world light sources, photographer apparatus, or adjustment parameters when shooting.

Despite this, the majority of individuals choose random parameters to expand their training samples. When creating an image via geometric or color casting transformation, the effective down-sampling approach should not disrupt the details and have label-preserving properties. The algorithms in [7, 8] are based on greedy search and trust-region optimization, respectively, which automate the search for optimal transformation parameters and maximize the loss value of robust classifiers. However, the processing takes time to find and backtrack appropriate values. Some experts offered legitimate augmentation strategies based on possible past information in order to minimize search regions. For example, [18] presents a transformation scheme in the melanoma specialist analysis field, with distortion of the lesion main axis size while maintaining symmetries and patter of the lesion, and [19] uses vocal tract length normalization to transform spectrograms to make solutions of speech datasets with the feature of sparsity. Through a multichannel autoencoder, [4] attempts to bridge the synthetic gap between augmented data and actual data.

Recently, Adversarial Nets [5] got plenty of attention; it extracted features from real images and obtained new synthetic imagery, which comprised of certain “selected” features in model. Unromantically, the synthetic data were unable to maintain morphological consistency with human visual sense, although they are similar to real samples that could trick the discriminate net. In our paper, we examined methods by applying the assigned transformation directly on the descriptor and conceptually argued the idiocrasy between traditional affine transformation and the color distribution technique.

3. Method

In this section, we first explain the augmentation methods that we used in the training model. We split them into two categories according to the different visual effects on deformation images.

3.1. Affine Transformation

We assume that the origin Image Matrix is and the transformation matrix is , and all the affine-transformed images could be generated from . For keeping the size of the image, we add the term P as padding at the end of function, which is .

Shift operator: panning the image from one or more directions and points outside the image boundaries filled by the edge pixels or constant 0. In mathematics, a shift operator’s transformation matrix is a special Nilpotent Matrix ; the no-zero value of N is only placed on the superdiagonal or subdiagonal. For example, the 5 × 5 superdiagonal Nilpotent Matrix seems like (1), and the origin image is represented as (2); the matrix that moves up one pixel was got from as the same as the right shift one pixel matrix is , which looks like (3).

Flip: many researchers enhance their training samples by the flip operator without prior knowledge; the reason is that flip operation just overturns the image from the horizontal or vertical direction and the valid pixel information has been kept. In computer arithmetic, given an image matrix , the sequence of is is rows and is columns . Note that reversing the order of the image sequence such as reading rows from , the image is flipped horizontally.

Rotation: the rotation operation acts as a random flip in different angles. Firstly, the origin input image (Euclidean geometry) has been reinterpreted using Cartesian Jones vectors’ transformation in coordinate conversion. Then, we assumed the input sample to be counterclockwise from the origin by angle . The point is from the image in the coordinate, and the rotated point has been generated, where . In our experiment, we make the center pixel of the image as the rotation center. The navel point refers to in a homogeneous coordinate; the object first moves to the origin, where the point changes like , and parameter t is the point–slope, given in . Rewriting the formula by the matrix calculation model in (4) and performing a clockwise rotation of angle generates the rotation matrix (5), given in the transformation matrix (6). Hence, shifting the matrix center from origin back to , and combing (4)–(6) and translation, we get the updated transformation object function as follows: (7)

3.2. Colorimetric Transformations

Many people modify their images with filters that alter the color or contrast distributions of the image in real life, and this may easily affect the shot photos if ambient lighting changes. Because the younger generation like to enhance picture structures by applying a color-casting filter, the collected dataset may include distorted images, and the robustness model should be adjusted to account for the disruption caused by these color manipulations.

Contrast enhancement: In the human visual system, when people look at some photos, the capacity to distinguish between the luminance of levels is contrast sensitivity; humans could easily recognize the difference when images are altered by contrast rather than absolute luminance. There are many existing definitions of contrast; to let the contrast adjustment as close to the human perception system as well as computationally cheap, we taken luminance contrast in our experiments. The formal is used to obtain the input training data, where I denotes the origin input image, I′ is the transformed image, and F is the contrast correction factor, as defined in (8):

In the equation above, the parameter C denotes the desired level of contrast, which formula (8) would operate in each R, G, B channel simultaneously. For convenience, the image contrast was adjusted by PIL (the python imaging library) in our experiments.

Color transformation: People intentionally vary the intensities of colors to show multiple color schemes, generally operating directly on tricolor (red, green, and blue) channel pixels, which affect the overall mixing of colors in images and sophisticated color balance corrections. Normally, the images have been shot to neutrals with the right temperature color fitters and ambient light. Unnatural color temperatures, on the other hand, might result in undesirable casts. To simulate this form of color casting, random color perturbation to input pictures was also generated using the PIL library package.

4. Experiments

4.1. Datasets

In our experiments, we test three representative image datasets to verify the reliability of the thesis. The number of each category was kept equal to avoid the effect from class imbalance; besides, each class’s size is relatively small to investigate the performance of the model when picture quantity was limited. 80% of the samples were divided to training, and the other 20% were fed to validation; when the model started training, augmented data were generated continuously until validate accuracy did not increase.

ImageNet: we artificially selected a small subset of ImageNet dataset, which consist of 10 classes: 500 samples for training and 100 for validating. Before transformation augmentation was applied, the size of the image is preserved in the original to evaluate the effectiveness of the augmentation technique and the model performance when training strongly discrete data. The object may lose its original shape feature, and just leave half or part of it. For example, some fruit or vegetable (e.g., pumpkin) has been cooked into food (pumpkin pie), which is totally different from the original (Figure 1)

The rest of the data come from Pascal VOV datasets. This standard picture dataset is often used for object class detection in photographs with complicated background information and multiple objects. It should be noted that VOC often has many labels for the same picture; we delete the duplicate tag images and replace them with fresh samples to ensure that all of the photos belong to the same category. Finally, there are 20 classes in the restructuring datasets, each with 10,000 identified photos.

Last but not least, there’s our Pests dataset (in Figure 2). Each picture is “pure,” meaning there is just one item in each shot, the backdrop is generally painted a solid color, and the object has a definite border. We designed it based on the visual system of a human; thus, the equivalent item is simple to detect and ensures network convergence rapidly. In addition, we normalized the object scale and had objects occupy “optimistic” average scale (0.3 to 0.5) [20] proportion of picture area to guarantee that object scale would not affect experiment findings. The picture size is resized to 2242243, with 10 classes and 500 photos allotted to training and the remaining 100 to validation.

4.2. Classification Descriptor

In this paper, we evaluated three deep convolutional classification descriptors. Details of each network architecture are illustrated in Table 1.

The AlexNet [9] reached a new developmental milestone in the convolutional neural network; it trains complex deep convolutional models in multiple GPUs, and shortens the training time of the neural network within a reasonable range by highly optimized parallel computing; these operations make deep networks possible to train large-scale datasets. The network consisted of five convolutional layers, three fully connected layers, and embedding with a Softmax layer, with the help of ReLU Nonlinearity and Local Response Normalization (LRN); at the first two convolutional layers, input weight parameters prevented from saturating and reduced the impact of gradient vanishing and exploding problem; AlexNet overcame the issue of overfitting and got top-5 in predicting the error precision (15.3%) at 2012 ImageNet large-scale visual recognition challenge.

In recent literature, the impact of depth is hot topic, as the representative of deep neural network, the VGG [10], reached state-of-the-art performance using only 16 convolutional layers and two fully connected layers. The whole network architecture was split into deep stacks of 3 × 3 kernel size to reduce the weight parameter, which proved effective in the receptive field. The VGG architecture was employed in our idea to analyze the depth impact for our augmentation strategy. Finally, instead of increasing the number of network layers, inception constructs itself using bocks, which are made up of numerous convolutional layers with varying convolutional filter sizes. Inception [11], which uses appropriate factorization to divide a 3 × 3 filter into 3 × 1 and 1 × 3, maintains a better balance of compute and memory while speeding up training and improving network nonlinear expression. We evaluate the network in Inception V3 to find out the influence of special network connection.

4.3. Experiment Settings

We use a series of experiments based on three network architectures and datasets to get to the conclusion that color augmentation is less effective than the usual affine transformation approach. The synthetic image would continue to be generated to train the class descriptor until the validate loss did not decrease, as shown in Figure 3. First, the laboratory data were separated into training and validation on a 4 : 1 scale, and then fed into various types of augmentation methods, and the synthetic image would continue to be generated to train the class descriptor until the validate loss did not decrease. Table 2 illustrates the augmentation parameter settings in detail. The decision rule of the color augmentation parameter is kept to ensure the perturbation in image would not be over 20% compared to the original. When one architecture and one dataset are selected, the other super-parameters are kept consistent; the training optimizer was set to SGD (stochastic gradient descent) with 0.9 momentum and decay ; in addition, the initial learning rate for datasets Pests and VOC was and for ImageNet was , which helps avoid the huge swing. The batch size is fixed at when training the VOC database and 8 for others, for shortening the training time and decreasing the gap in training accuracy and validation accuracy. The function of early stopping was employed to monitor the quality of validation loss; when the minimum change in validation loss is less than , the model will continue to train the five epochs before being stopped. We account our results from the average value of three repeated experiments.

5. Result

We first consider the five augmentations which are divided into two groups—color and tradition—and evaluate the training and validation average accuracy of them in Table 3.

The experiments show that there is a major overfitting issue in the ImageNet and VOC datasets, despite the fact that the model automatically stops training when the validation loss does not decrease. As a consequence of the small data sizes and varied man-made object structures, neural networks were able to achieve “optimistic” training accuracy and worst-case prediction outcomes with ease. The network attempted to incorporate extraneous picture characteristics and established incorrect boundaries for distinct categories using the ineffective augmentation strategy, expanding the gap between the training and test accuracy. The classic transformation method, on the other hand, could correctly analyze unknown photographs and shorten the distance, reducing the propensity of overfitting. The ImageNet and VOC datasets have particularly provide this evidence; the input image’s background and recognized object are complicated, mutative with fuzzy noise and redundant information; nevertheless, traditional affine augmentation distorts the image and preserves the property of object features, which improves the ability of the model to learn crucial features and remove noisy background information. In Pests images, three network architectures did well when using augmented data, but the challenge of overfitting remained in the color augmentation method. Another interesting note was that the network with few layers and completion tends to show better results in both training and validation; this may due to the preset super-parameters and no fine-tuning operation in training the limited labeled data, even though it also highlights the important of selecting the proper augmentation method.

5.1. The Performance on Data Pests

From the initial phase, it was discovered that deep neural networks are prone to extracting a large amount of residual variation as the foundation structure for generalizing models with additional needless parameters in order to suit new data or accurately forecast real-world objects. As a result, we will ignore the phenomena of overfitting in this section and just compare the validation accuracy to explain the performance of each augmentation approach.

Figure 4 shows the progression of this validation. Most notably, although each augmentation technique performs differently, classic affine augmentation outperforms color augmentation, and the rotation manipulator has a high prediction accuracy of 85.2% when compared to the other approaches. Another important element to note is that when it comes to training, the flip operation performs somewhat better than color. Comparing the inception model, large distance was observed between other traditional methods, and we infer that this singular result could be caused by a special network block module in Inception V3, where the same receptive field in the spatial convolutional filter is broken to a convolutional kernel followed by a convolutional fitter. These asymmetric convolutions are liable to fit flipping transformations. For example, the image seems like , and after horizontal reverse could be . According to the convolutional function with the convolutional operation in the filter size of and , the network studies the proper weight vector value and to make equal to. .

5.2. The Performance on ImageNet

The results of the diverse augmentation algorithm in ImageNet were analogous with previous Pests one, where the color perturbation method allowed better fitting in the training samples when generating many obscure features and evaluating worst by inadequately capturing them to represent the model structure. In ImageNet, we further analyze the network training curve in those six argumentation methods. Figure 5 illustrates how the validate accuracy varies as the number of epochs increased. The first observation was that the input samples got from contrast, color, or origin raise rapidly at the beginning of the training and almost stopped at the same time around 0.6 after about 35 iterations when validation loss did not increase, while other traditional transformations raised smoothly and costed more time to fit elastic deformable objects. To address this issue, the model has to learn more scale-invariant features from the distortion of objects; those scale-invariant information also hold for predicting samples with high probability. However, it cannot ignore the fact that the curve produced by the augmented painting is similar to the real image, and it is more inclined to reach saturation in less training time, that is, the number of characteristic parameters in the predicted data is less than the number of observed models.

5.3. The Performance on VOC

In the experiment with the VOC dataset, we emphasized to create augmented samples to compare their capability of overcoming the overfitting problem. The Figure 6 displays the overfitting rate of each transformation method in training AlexNet, VGG16, and InceptionV3 models. The advantage of the conventional affine transformation algorithm lies in the large and obvious difference in the results of the color enhancement method. In all aspects, the rotation operation can effectively narrow the gap, and its role is the same as that tested in Pest data. The combination of shift transformation and center rotation transformation increases the difficulty of model fitting. Color augmentation, on the other hand, proved ineffective in reducing the impact of overfitting; in training with contrast-enhanced pictures, the Inception V3 model had a 0.53 overfitting rate, demonstrating the failures of learning crucial characteristics. Another observer observed that as the convolutional network depth grew, the model became more difficult to train, causing the model to fail to converge when the amount of input pictures was insufficient to meet the demands.

5.4. Experimental Analysis

Why does color augmentation learn features better in training samples but hard to predict test images? Our intuition is that the methods of color and contrast casting break the balance in entire pixel distribution, which distorts the photos in whole pixel values rather than parts. For illustrating this phenomenon, we picked up one image and drew its histogram at three contrast augmentation parameters. The chart (Figure 7) displays the pixel distribution of the image, when the parameter 1.0 shows the original one. When the image contrast changes to 0.7, the image loses more “color”; composite images tend to return pure grey, with the histogram more closely resembling a Gaussian distribution, resulting in the disappearance of many of the white pixels necessary to form the outline of the butterfly’s wings. When the contrast value goes up, the black pixel increases sharply and the number of intermediate pixel value spreads to other pixel values; the curve of the histogram changes smoothly, which occurs as the fuzzy edge of the object in the figure as well. The obscure image lost many scale-invariable features, where the network fit them well and worse in real-world objects.

On the other hand, in tradition affine transformation, though distorting the shape of the object in the image, the valid pixels of image lost were less than color transformation; it preserved the most scale-invariable features and moved origin pixels from the origin location to new positions. It forced the network model to readjust the weight value; when fitting the synthetic image transformed under different enhancement parameters, a visual layer is built to correct the affine transformation. This point might tell us why the rotation operation was effective in reducing overfitting; for obtaining the desired image orientation direction, each pixel’s was position replaced after multiplying by sines and cosines on the homogeneous coordinates.

6. Conclusion

This paper demonstrates the performance of five data augmentation methods in improving the accuracy of image classification. A series of experimental results strongly prove our thesis that applying proper affine transformation to generate synthetic data is the robust way than changing the effect in image color and contrast, which can boost training and test accuracy as well as reduce overfitting. We surmise that the bad outcome in color augmentation could be due to a break in the density of the feature space distribution; it is an interesting future direction of research to consider along with wider range parameters of color manipulation. It is also necessary to testify our conjecture in various resource data and diverse network architectures. Yet, another research issue is why rotation transformation is better than other affine transformation to reducing the overfitting phenomenon when selecting the proper rotation angle. Finally, it is also interesting to test other affine transformations such as zoom, shear, and/or the combination of those transformations to get a general standard to select different transformation methods.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the grant from the 2021 Cangzhou Science and Technology Plan Program (Grant No. 213102007) and 2022 Scientific Research Projects of Colleges and Universities in Hebei Province (Grant No. QN2022200).

References

A. G. Howard, “Some improvements on deep convolutional neural network based image classification,” Biomedical Signal Processing, vol. 2013, 2013, https://arxiv.org/abs/1312.5402.
View at: Google Scholar
L. S. Yaeger, R. F. Lyon, and B. J. Webb, “Effective training of a neural network character classifier for word recognition,” in Proceedings of the Advances in neural information processing systems, pp. 807–816, MIT Press, Cambridge, MA, USA, 1997.
View at: Google Scholar
R. Wu, S. Yan, Y. Shan, Q. Dang, and G. Sun, “Deep image: scaling up image recognition,” Computer Vision and Pattern Recognition, vol. 7, no. 8, 2015, https://arxiv.org/abs/1501.02876.
View at: Google Scholar
X. Zhang, Y. Fu, A. Zang, L. Sigal, and G. Agam, “Learning classifiers from synthetic data using a multichannel autoencoder,” Computer Vision and Pattern Recognition, vol. 2015, 2015, https://arxiv.org/abs/1503.03163.
View at: Google Scholar
J. Wang and L. Perez, “The effectiveness of data augmentation in image classification using deep learning,” Tech. Rep., 2017, Technical report.
View at: Google Scholar
C. Bunkhumpornpat, K. Sinapiromsaran, and C. Lursinsap, “DBSMOTE: density-based synthetic minority over-sampling TEchnique,” Applied Intelligence, vol. 36, no. 3, pp. 664–684, 2012.
View at: Publisher Site | Google Scholar
M. Paulin, J. Revaud, Z. Harchaoui, F. Perronin, and C. Schmid, “Transformation pursuit for image classification[C],” in Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3646–3653, IEEE, Columbus, OH, USA, 23-28 June 2014.
View at: Google Scholar
A. Fawzi, H. Samulowitz, D. Turaga, and P. Frossard, “Adaptive data augmentation for image classification[C],” in Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), pp. 3688–3692, IEEE, Phoenix, AZ, USA, 25-28 Sept. 2016.
View at: Google Scholar
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105, 2012.
View at: Google Scholar
K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” Computer Vision and Pattern Recognition, vol. 2014, 2014.
View at: Google Scholar
C. Szegedy, W. Liu, Y. Jia et al., “Going deeper with convolutions[C],” in Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 7-12 June 2015.
View at: Google Scholar
X. Ioffe and C. Szegedy, “Batch normalization: accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on Machine Learning, pp. 448–456.
View at: Google Scholar
N. Srivastava, G. Hinton, A. Krizhevsky, and I. Sutskever, “Dropout: a simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
View at: Google Scholar
I. Sato, H. Nishimura, and K. Yokoi, “Apac: augmented pattern classification with neural networks,” Pattern Classification, vol. 2015, 2015.
View at: Google Scholar
S. C. Wong, A. Gatt, V. Stamatescu, and M. D. McDonnell, “Understanding data augmentation for classification: when to warp?” in Proceedings of the 2016 International Conference on Digital Image Computing: Techniques and Applications (DICTA), pp. 1–6, IEEE, Gold Coast, QLD, Australia, 30 Nov.-2 Dec. 2016.
View at: Google Scholar
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.
View at: Publisher Site | Google Scholar
P. Sermanet and Y. Lecun, “Traffic sign recognition with multi-scale convolutional networks,” in Proceedings of the 2011 International Joint Conference on Neural Networks, San Jose, CA, USA, 31 July-5 August 2011.
View at: Google Scholar
C. N. Vasconcelos and B. N. Vasconcelos, “Increasing deep learning melanoma classification by classical and expert knowledge based image transforms,” Skin lesion analysis, vol. 2017, 2017.
View at: Google Scholar
H. Zhao, Z. Liu, X. Yao, and Q. Yang, “A machine learning-based sentiment analysis of online product reviews with a novel term weighting and feature selection approach,” Information Processing & Management, vol. 58, no. 5, p. 102656, 2021.
View at: Publisher Site | Google Scholar
O. Russakovsky, J. Deng, H. Su et al., “ImageNet large scale visual recognition challenge,” International Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2022 Ruirui Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Mathematical Problems in Engineering

Advanced Aspects of Computational Intelligence and Applications of Fuzzy Logic and Soft Computing

The Performance Research of the Data Augmentation Method for Image Classification

Abstract

1. Introduction

2. Related Work

3. Method

3.1. Affine Transformation

3.2. Colorimetric Transformations

4. Experiments

4.1. Datasets

4.2. Classification Descriptor

4.3. Experiment Settings

5. Result

5.1. The Performance on Data Pests

5.2. The Performance on ImageNet

5.3. The Performance on VOC

5.4. Experimental Analysis

6. Conclusion

Data Availability

Conflicts of Interest

Acknowledgments

References

Copyright