Abstract
With the rapid development of computer vision and robot technology, smart community robots based on artificial intelligence technology have been widely used in smart cities. Considering the process of feature extraction in fruit classification is very complicated. And manual feature extraction has low reliability and high randomness. Therefore, a method of residual filtering network (RFN) and support vector machine (SVM) for fruit classification is proposed in this paper. The classification of fruits includes two stages. In the first stage, RFN is used to extract features. The network consists of Gabor filter and residual block. In the second stage, SVM is used to classify fruit features extracted by RFN. In addition, a performance estimate for the training process carried out by the -fold cross-validation method. The performance of this method is assessed with the accuracy, recall, F1 score, and precision. The accuracy of this method on the Fruits-360 dataset is 99.955%. The experimental results and comparative analyses with similar methods testify the efficacy of the proposed method over existing systems on fruit classification.
1. Introduction
China has achieved rich results in the field of smart cities through years of construction. These achievements have played an important role in alleviating the “big city disease,” improving the quality of urbanization, realizing refined and dynamic management, enhancing the effectiveness of urban management, and improving the residents’ quality of life. Community is the “smallest cell” of social management. It is an important aspect of the modernization of urban management system and management capacity to enhance the community management capabilities [1, 2]. With the increasing scale of community, there are many large communities in China. These communities need a lot of fruits and other living materials every day. A lot of labors are required to tasks such as fruit sorting. Labor force has dropped sharply during the epidemic. Therefore, demand for community robot with automatic sorting and store management is particularly urgent. It can not only make up for the shortage of labor but also save costs and improve efficiency [3].
Fruit sorting is the core function of intelligent community robot. It has been highly concerned by researchers and achieved a lot of research results in recent years. A fruit classification method was proposed by Arivazhagan et al. [4]. It extracted different features of fruits and used BP classification to classify. The accuracy of this method was reported to be 86%. A fuzzy logic and -means clustering method for sorting fruit and vegetable was proposed by George [5]. The method achieved an accuracy of 86%. Kuang et al. [6] proposed a fruit detection method based on multiple-color channels. The method fused the histogram of oriented gradient, local binary pattern (LBP), and Gabor wavelet-based LBP features. The features were fused and used to train SVM on previously divided blocks of image to produce optimal results.
In addition, deep learning models based on fruit classification had been proposed by some researchers. Yu et al. [7] proposed a 13-layer convolutional neural network (CNN) for fruit recognition classification. The method used three types of data enhancement methods: image rotation, gamma correction, and noise injection. It also compared maximum pooling and average pooling. The gradient descent algorithm was used to optimize the CNN network. And the method achieved an accuracy of 94.94%. The method was 5% higher than the accuracy of the common CNN recognition method. An improved CNN to detect the appearance of cherries was proposed by Momeny et al. [8]. The method combining maximum pooling and average pooling was used to classify cherries after getting preprocessed images. Gao et al. [9] proposed a multicategory apple detection method based on Fast Region-based Convolutional Network (Fast R-CNN) method. The method achieved an average accuracy of 0.909, 0.899, 0.858, and 0.848 under five different shading conditions, respectively.
Traditional methods of fruit classification have achieved good classification results. But there had been many problems. Machine learning of fruit classification is a complex process. It needs to extract features of fruit [10]. However, the method of deep learning for fruit classification has problems of deep network layer and slow speed. Therefore, a fruit classification method based on the above problems is proposed in this paper. It is based on RFN and SVM for classification. RFN and SVM are used as the methods to extract features and classification, respectively.
The organization of the paper is as below. The principle of the method introduced in this paper is discussed in Section 2. The experimental process and result analysis are introduced in Section 3. The paper is concluded in Section 4.
2. Materials and Methods
Traditional methods of fruit classification need to extract features of fruits by manual work. Therefore, a method for fruit classification based on RFN and SVM is proposed in this paper. This section introduces the process and principle of fruit classification.
2.1. Feature Extraction
Feature extraction is an important process in fruit classification. RFN is proposed in this paper for feature extraction. It consists of residual filtering block. Residual filtering block is composed of Gabor filter and residual block. Gabor filter is used in residual filtering block to replace the convolution kernel. It can extract features of fruits in different directions and scales [11]. And residual block can solve problems of feature loss and gradient disappearance. The structure of the residual filtering block is shown in Figure 1. and are input and output of the residual filtering block, respectively. The G1 layer is a convolutional layer using a Gabor filter. The residual block is constituted by C1, C2, and C3.

The Gabor filter can be convolved with image to obtain the local information in the spatial domain [12, 13]. The Gabor filter can be generated according to where and mean the location of the pixels in the spatial domain [14]. The positions and after pixel rotation are calculated by Equations (2) and (3). denotes the response of a Gabor filter with standard deviation , aspect ratio , wavelength , and phase offset to an image at point on the image plane [15].
The convolution process of the Gabor filter is shown in Figure 2. The parameters of the Gabor filter can be adjusted during BP. It is similar to the common convolution kernel. The Gabor filter can only convolve a single channel of image. It is different from common convolution kernel. The feature extracted by the Gabor filter is more robust than that extracted by common convolution kernel.

Feature loss and gradient disappearance occur in network transmission process [16]. These problems are solved by adding residual network in this paper. The output of the residual network is calculated by Equations (4) and (5). The equation of can be realized through feedforward neural networks with “shortcut connections.” and are input and output of the convolutional layers , respectively. is used when matching dimensions [17]. The dimension of input features can be increased or decreased by a convolutional layer when the number of input and output features of the residual network is different. The equation represents the residual mapping to be learned [18]. and are different square matrices, respectively. The forward neural network uses two convolutional layers. And ReLU activation function is used to increase nonlinearity of the network.
Gradient of the residual block corresponding to chain rule is calculated by Equation (6), where denotes input of the entire neural network. denotes the output parameter closest to the loss function. The chain rule is used to convert the derivation of into the derivation and multiplication in Equation (6). As a result, the gradient is difficult to disappear [19].
The feature extraction network in this paper is shown in Figure 3. It consists of residual filtering block, pooling layer, convolutional layer, fully connected (FC) layer, and softmax layer. The feature is flattened before entering the FC layer. RFN includes two processes during training: forward propagation (FP) and BP. The current layer is a convolutional layer during FP. Output of the layer is calculated by Equation (7), where denotes feature vectors of the input. denotes the weight parameter of the layer . denotes the value of bias.

Finally, data is inputted into the RFN after it is trained. And feature vectors are outputted at the FC layer of the network. The feature vectors outputted in the FC layer are more robust than the output after flattening. Because the dimension of data after flattening is too high. It is difficult to calculate. Then, many neurons after flattening are inactive due to the ReLU function used in the network. So the data obtained is sparse. Therefore, feature vectors are outputted by the FC layer in this method. The feature vectors of the FC layer are spliced using the feature vectors. And they are calculated by Equation (8), where is concatenation and denotes the feature vectors of the th channel of the layer.
The extracted feature maps are shown in Figure 4. The features are extracted through the Gabor filter and multiple convolutional layers. Then, they are concatenated through the FC layer. The features extracted by the RFN are more scientific and robust compared with precision of manual feature extraction.

2.2. Classification
Common CNN uses softmax for classification. The output of softmax is the predicted probability of each category. The sum of the probability category is one [20]. Therefore, this method is susceptible to impact of global data classification for each sample at the same time. SVM is used to replace softmax of the CNN for classification in this paper.
The decision boundary of SVM is to solve the maximum margin hyperplane for sample. The parameters can be obtained by Equation (9) in order to obtain the best hyperplane in the second classification, where is the penalty coefficient and is the error [21].
Only the inner product () between samples needs to be obtained in the dual problem of linear SVM. And there is no need to specify a nonlinear transformation [22]. Therefore, the inner product of samples needs to be replaced by kernel function in a nonlinear classification problem. There is a mapping from the input space to the feature space as [23]. The kernel Equation (10) is satisfied for in any input space. Therefore, it is used to replace the mapped sample inner product [24]. SVM classification can be obtained by Equation (11), where is number of features. Labels of the th feature vectors and are related parameters of the th feature vectors. The mapped sample inner product is replaced by the kernel function, which can effectively solve the linear inseparable sample classification problem [25].
The features extracted from the previous section are used to train the SVM. The 10-fold cross-validation method is added during training. This method divides the data into 10 equally. One of the data is selected as the test set during each training. And the rest are used for the training set. The above operation is repeated ten times. And the selected test set is not repeated each time. The -fold cross-validation method is used to reduce the occasionality and improve the generalization of the model.
3. Experiment and Analysis
The operating system of this experiment is Windows 10. The GPU device is NVIDIA GEFORCE GTX 960M. The framework is PyTorch 1.6.
3.1. Data Processing
The data in this paper comes from the public dataset Fruits-360. It has 22688 pictures of 131 classes. Training set and testing set are divided in a ratio of 7 : 3. They are 15882 and 6806 pictures, respectively. The size of the image is scaled to . And one of the batch data is shown in Figure 5.

3.2. Feature Extraction
Feature extraction is an important step in the fruit classification process. Features from input Fruits-360 images are extracted by using RFN. Firstly, RFN proposed in this paper is trained. The dimension of the input image is . And SGD is used as an optimizer. In addition, learning rate is set to 0.001. The training result of the network is shown in Figure 6. The horizontal axis and the vertical axis are epoch and accuracy, respectively. The accuracy of RFN is 97%. Therefore, the features extracted by the model have certain reliability. Then, the data is inputted to the RFN. And feature vectors are outputted in the FC layer. The feature vectors are used for SVM classification.

3.3. Results and Discussion
The feature vectors are extracted from the RFN to train the SVM. And the indicators of evaluation are accuracy, precision, recall, and F1 score. They are calculated by Equations (12), (13), (14), and (15), respectively. True positive () denotes the positive sample predicted by the model to be positive. True negative () denotes the negative sample predicted to be negative by the model. False positive () denotes the negative sample predicted to be positive by the model. False negative () denotes the positive sample predicted to be negative by the model.
reflects the ability of the model to judge the entire sample. reflects proportion of the samples in the positive samples judged by the classifier. reflects proportion of the positive samples correctly judged by the classifier in the total positive samples proportion. is the harmonic average of and .
The feature vectors extracted by the RFN are used to train four ML models. Figure 7 shows the relationship between the model scores of the ML models under different parameters and value of parameter . Three different kernel functions rbf, poly, and linear are used by SVM for comparison. SVM with linear kernel function is the highest score of one. Comparing Decision Trees (DT) under different maximum depths (), DT stops splitting when its depth reaches the specified maximum depth threshold. The highest score for a DT with is 0.9411. Comparison of Random Forest (RF) under different number of DT (). The highest score for a RF with is 0.9941. Compared to KNN under different values, the value of means that the number of samples closed to the training sample from the prediction sample is . The highest score for a KNN with is one. It can be concluded that the features extracted by the RFN have better robustness.

(a)

(b)

(c)

(d)
Then, the classifiers with the highest scores for each model in the above were compared. The results are shown in Table 1. The SVM with linear kernel has the best effect. is 99.955%. is 99.958%. is 99.962%, and is 99.997%.
The -fold cross-validation method is used for the training of four ML models. Figure 8 shows the comparison between four ML models using 10-fold cross-validation and without -fold cross-validation. The horizontal axis denotes the different parameters of the classifier. And the vertical axis denotes the score of model performance. The red and black lines are the methods of adding and not adding -fold cross-validation during training, respectively. It can be concluded that the -fold cross-validation method is used to reduce the occasionality.

(a)

(b)

(c)

(d)
In the experiments, three features are compared. They are the features extracted by the RFN, the features extracted by common CNN, and the RGB features. The above three features are used to train four ML models. The results are shown in Tables 2–5.
The classification effect of the features extracted by the RFN is better than that by the other two methods. The results show that the classification method using the RFN combined with SVM has the best effect. is 99.955%. is 99.958%. is 99.962%. And is 99.967%.
In addition, the proposed method is also evaluated to classify similar classes of category Apple, Cherry, Grape, Pear, Potato and Tomato as shown in Table 6. The results are shown in the confusion matrix in Figure 9. The confusion matrix shows the relationship between the predicted value of the classifier and true value. So the confusion matrix is used to evaluate the performance of the method. The best SVM classification effect is to use the features extracted by RFN. is 99.966%. The method of RFN and SVM proposed in this paper for fruit classification has a good classification effect.

(a)

(b)

(c)
4. Conclusion
A method of RFN for smart community robot fruit classification is proposed in this paper. The features are more robust that are extracted by the RFN which combines the Gabor filter and the residual block. Four ML models (SVM, RF, DT, KNN) are used to test the features extracted by the RFN. And SVM is used to replace the softmax classification of the common CNN. It improves the classification effect of the model. And the -fold cross-validation method is used in training. It can improve the accuracy of the model and reduce the occasionality by random classification. The method proposed in this paper is compared with other methods. The results show that the accuracy of the proposed method on the Fruits-360 dataset is 99.955%. It is an improvement compared with the original result of author. In addition, the accuracy among the 6 categories of fruits that are difficult to distinguish is 99.966%. The method for smart community robot fruit classification proposed in this paper can achieve good results. This method can replace the traditional method of artificial feature extraction for classification. And it can also be further extended to other fields.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This study was supported by Science and Technology Program of Guangzhou (No. 2019050001), project of Shenzhen Science and Technology Innovation Committee (JCYJ20190809145407809), project of Shenzhen Institute of Information Technology School-level Innovative Scientific Research Team (TD2020E001), Program for Guangdong Innovative and Enterpreneurial Teams (No. 2019BT02C241), Program for Chang Jiang Scholars and Innovative Research Teams in Universities (No. IRT_17R40), Guangdong Provincial Key Laboratory of Optical Information Materials and Technology (No. 2017B030301007), Guangzhou Key Laboratory of Electronic Paper Displays Materials and Devices (201705030007), and the 111 Project.