Abstract
As a cutting-edge technology, hyperspectral remote sensing has been widely applied in many fields, including agricultural production, mineral identification, target detection, disaster warning, military reconnaissance, and urban planning. The collected hyperspectral data have high spectral resolution and spatial resolution and are characterized by a large amount of information, redundancy, and high dimension. At the same time, there is a strong correlation between the bands. Therefore, hyperspectral data not only provides rich information but also brings great challenges for subsequent processing. Hyperspectral image classification is a hot issue in remote sensing information processing. Traditional hyperspectral remote sensing image classification methods only use the spectral features of the image without considering the spatial features of each pixel in the hyperspectral remote sensing image. In this paper, a hyperspectral image classification method is proposed not only considering spectral features but also considering texture features. This method jointly considers both these features. Firstly, six texture features contributing a lot to each pixel of hyperspectral remote sensing image are extracted by using a gray level cooccurrence matrix, and then, the spectral features of each pixel in neighbor are combined to form the texture-spectral features. Finally, the classification experiment of the Indian Pines and Pavia University scene is carried out based on a support vector machine and extreme random tree algorithm, and the obtained results show that the proposed method achieves higher classification performance than the traditional method.
1. Introduction
Remote sensing (RS) means remote perception. In a broad sense, RS is a technical means to detect observation targets and obtain relevant information without contact at a certain distance. In a narrow sense, remote sensing can be understood as a comprehensive technology to detect the earth’s surface using loaded platform sensors and electromagnetic wave signals with the help of ground, aerospace, and aviation platforms, in order to obtain relevant information through the interaction mechanism between signals and surface objects. Then, the obtained information is processed for analysis and application [1]. With the development of technologies, remote sensing technologies have experienced panchromatic photography, color photography, and multispectral remote sensing. Then came the era of hyperspectral remote sensing in the 1980s.
Imaging spectrometers can collect information from hundreds of very narrow spectral bands and obtain image data as a continuous cube of the spectrum. In the process of hyperspectral data collection, solar radiation signals are transmitted to the ground object through the atmosphere and then transmitted through the atmosphere after the surface reflection and radiation of the object, so as to be collected by the sensor. Through radiation calibration and processing, hyperspectral data of the target of interest can be obtained [2].
Platforms for hyperspectral acquisition can be divided into three categories: ground, aerospace, and aeroplane. The main application scope of the ground platform is the height of direct ground acquisition or space imaging at a height of not more than 50 m. The platform is mainly used for agricultural or laboratory imaging. The action distance of aviation platform is 10 to 100 kilometers above the ground. The imaging spectrometer of hyperspectral image is carried on small aircraft or unmanned aerial vehicle (UAV), which is a commonly used airborne imaging method and requires high precision in external hardware conditions and operation during imaging. The space-borne imaging function distance of space platform is usually more than 150 kilometers above the ground, and the imaging spectrometer is carried by satellite. The working distance of earth imaging is generally tens of thousands of kilometers, and the experiment cost is huge.
Due to the rich information of hyperspectral images, hyperspectral remote sensing technology has been applied to every aspect of life in the world and has a wide range of applications. In military applications, hyperspectral images can meet the requirements of military reconnaissance, camouflage and anticamouflage operations, and information acquisition. In interstellar exploration, ocean, and atmosphere detection, it can accurately analyze the structure and composition of objects and provides technical support for subsequent processing by collecting hyperspectral images. In public security, this technology can be used to find dangerous situation accurately using hyperspectral data. It can better maintain public security. In the field of agriculture, it can be used to quickly identify invasive plant species and effectively monitor the growth of vegetation. In a geological survey, it is helpful to the identification and classification of rocks and minerals. At the same time, hyperspectral remote sensing technology is not limited to the above fields but has been widely applied in many fields such as food, ecology, communication, tourism, and transportation [3].
The purpose of hyperspectral image classification is to distinguish multiple target features captured in the collected images, and the process is shown in Figure 1. Each pixel in the image is marked as a different category label according to the certain rules and collected spectral characteristics of each band or the spatial structure of the image and other data information.

The theoretical basis for the classification of hyperspectral images is as follows: under the same natural conditions (such as illumination, terrain, or vegetation coverage), the spectral information and spatial distribution information of ground objects of the same category in hyperspectral images should be the same or similar, and the data information of different categories should be significantly different. This kind of similarity and difference between classes is embodied in the fact that the information vector of the pixel where the same kind of ground objects are located will be gathered in the same information space area; conversely, the information vector of the pixel where the different kinds of ground objects are located will be gathered in different information space area [4]. In actual cases, due to the low spatial resolution of spectral imager, pixels not only contain single distinct object but may contain two or more distinct object or features. These pixels are known as mixed pixels. The existence of mixed pixels increases the difficulty of classification, and there are certain differences in the data information observed with the same type of object pixels which blur the dissimilarities of data information between different types of object pixels. To that need to be pointed out in actual situation similar to the mixed pixels image, the information vector is not distributed in the space area of a point, but according to certain constraints, relatively concentrated distribution is made in a certain space range of concentration which leads to some overlapping phenomenon and pixels cannot be divided clearly [5].
Computer classification refers to the classification process of hyperspectral images by using the computer as the processing environment of hyperspectral data and using pattern recognition and artificial intelligence technology according to various information collected in the image. With the incessant development of science and technology, the continuous improvement of theory, and the updating of computer hardware, computer classification has the characteristics of short classification time, good classification effect, and strong objectivity and has the mainstream classification method now.
Firstly, the classification method based on hyperspectral spectrum curve in computer classification is introduced. The principle of this classification is relatively simple. It mainly uses the collected spectral curve to reflect the optical physical properties of ground objects and adopts some simple matching algorithms to classify them. These algorithms for classification include spectral angle mapper (SAM), spectral information divergence (SID), binary coding (BC), and spectral absorption index (SAI).
The spectral angle matching algorithm is the most commonly used algorithm which regarded the spectrum of each pixel of the hyperspectral image collected as a multidimensional vector and calculated the included angle between the pixel to be classified and the reference spectrum to obtain the similarity between the two vectors and classify them [6, 7]. The reference spectrum can come from the image itself or the existing spectrum library. The smaller the included angle, the greater the similarity between the pixel and the reference spectrum. Finally, the classification is determined according to the similarity threshold. The characteristics of this method are as follows: the classification is determined only by the similarity of spectral shape, the algorithm is simple, but the positive and negative correlation between spectral vectors cannot be distinguished, so it is difficult to distinguish different spectra with the same included angle, and the overall classification effect is not good. Due to the limitations of SAM algorithm, improved algorithms keep appearing. For instance, the traditional single-endian SAM is expanded to multiendian SAM, which can better represent the spectral features of each class [8]. The authors in [9] proposed SAM based on weight and kernel feature space separation transformation, which increases the discrimination. SAM based on multiple training samples is applied to Graphics Processing Units (GPU) to improve classification effect and processing speed [10]. In addition, SAM is similar to some extent with SID algorithm. The principal difference between the two methods is that SID judges the similarity between spectra by calculating the information entropy of spectral curves [11].
Spectral binary coding is another classification technique to find and match processing targets quickly and effectively in the spectral library [12]. The idea of the method is as follows. Firstly, set a threshold value and then compare the value in each band of the pixels to be processed collected by the high optical spectrum with this threshold value. If the value is greater than the threshold value, the code is 1; otherwise, it is 0. In this way, each pixel generates a binary coded curve, and the similarity coefficient between these curves and the binary coded vector in the spectral library is calculated to finally determine the ground object category. Through processing, some redundant information in the collected image is removed well, and the algorithm is simple and improves the speed of classification. However, the limitation is that this simple binary coding will make a lot of detailed spectral information lost. The information cannot be retained effectively, so it cannot efficiently process the finer classification. In view of some shortcomings, some scholars have made improvements. The authors in [13] extended the spectral binary coding to multichannel binary coding. In another work, multiple thresholds are set and the results are compared with each threshold in order to propose a multithreshold coding classification [14]. In [15], the waveform characteristics of the spectrum are combined with the traditional binary coding to better describe the spectral curve.
In the literature, most of the hyperspectral image classification focused on the spectral features of the image. However, texture feature analysis of hyperspectral data is still not addressed. In this paper, our aim is to propose classification algorithm based on cooperative learning of spectral features and spatial texture features of hyperspectral images using support vector machines (SVM). SVM has ability for efficient learning and performs effectively in high-dimensional spaces.
The rest of the paper is structured as follows. Section 2 explains the collaborative learning algorithms and proposed hyperspectral image classification method. Section 3 discusses the simulation results and analysis. Section 4 provides the conclusion of this work.
2. Collaborative Learning Algorithm of Spectral and Texture Features
In this section, we discuss hyperspectral data, the collaborative learning algorithms to construct strong unique features for the classification. Then, we discuss the SVM-based spectral and texture feature cotraining algorithm. At the end, hyperspectral image classification algorithm based on sparse feature and neighborhood homogeneity is discussed.
2.1. Introduction to Collaborative Learning
Collaborative learning was first proposed by A. Beam and T. Mitchell in 1998. Collaborative learning is a semisupervised learning algorithm that makes the following assumptions about the features available to the data: the dataset contains two views and features can be divided into two groups of features that provide different supplementary information to each other. Ideally, these two sets of features should be class-conditional independent, and a strong classifier can be derived from each set of features.
Collaborative learning is applied to a binary classification of web pages [16]. The main flow of the algorithm is as follows. First, two sample datasets containing labeled and unlabeled are constructed. The algorithm starts by selecting a portion of unlabeled samples as a “connected” dataset. The labeled sample set is used to train two classifiers on the two sets of features, respectively, based on the naive Bayes method which is the probabilistic method that generally works on Bayes’s theorem along with strong independent assumptions among the features. Then, the two classifiers are used to classify the samples in the “connected” dataset, respectively. Since the output result of the naive Bayes classifier is the generic probability, the most “reliable” positive and negative example classification results in the probability output of the classifier are selected, and the classification results and corresponding samples are put into the labeled dataset, and then, the unlabeled sample set is used to update the “connected” dataset. This process is repeated several times, which is the result of the algorithm.
2.2. Collaborative Learning Algorithm of Spectral Features and Texture Features
Based on the characteristics of collaborative learning algorithm, hyperspectral image classification and its unique characteristics and difficulties are considered under this framework, and the following issues should be focused on: (1)Two distinct, nearly independent strong features must be constructed. For the classification of hyperspectral data, we hope to construct effective texture features in the plane dimension of image space in addition to the commonly used spectral information features(2)Select a classifier suitable for hyperspectral image classification. The classifier can effectively deal with high-dimensional problems and the influence of complex noise bars(3)How to select “reliable” classification results of unlabeled data to update existing classifiers and labeled datasets also needs careful consideration
For the spectral features used in hyperspectral image classification, we consider using the original data without too much processing except for simple preprocessing such as noise band removal and data normalization. At present, there are also some research works on data dimension reduction, such as band selection or feature extraction, for the large amount of original hyperspectral data. However, recent studies show that, considering only the classification problem, except for the preprocessing method of removing noise bands, data dimension reduction does not significantly improve the classification accuracy of hyperspectral data. Texture feature has been proved to be very important in image analysis. However, texture analysis of multispectral and hyperspectral images is still a very difficult problem. In this paper, for hyperspectral images, we aim to find a two-dimensional plane representation that can best represent the change of image space plane dimension. In this paper, we first use principal component analysis (PCA) for feature transformation of hyperspectral image data and then use Gabor texture features to conduct texture analysis on the gray-scale images composed of the first six principal components of all pixels. PCA is chosen here mainly because these principal components can capture most changes of the spatial plane dimension of the hyperspectral image cube.
Gabor texture feature is a common texture feature. A two-dimensional Gabor function is a Gaussian kernel modulated by a sine wave of some frequency and direction. The real response of Gabor filter in spatial domain can be given by the following equation:
Here are
In equations (1) and (2), is the standard deviation of the Gaussian envelop that is the size of the acceptance domain, shows the orientation of Gabor function, and is the spatial aspect ratio that determines the ellipse shape of the Gabor function [17]. The high spatial resolution is specified by small value of . The parameter gives real values in the range of . From this, we can see that “harmony” is a pair of words which corresponds to the output image of a particular filter.
is the width of the image. For each image input by the filter, for its output, the following nonlinear sigmoid function is used to transform:
Then, calculate Average Absolute Deviation (AAD). Finally, a set of texture feature output graph is obtained.
2.3. Classification and Trusted Marker Prediction
In the collaborative learning algorithm, the classification task is accomplished by naive Bayes algorithm. The output of naive Bayes is in the form of probability, which is well suited for cases where the most “reliable” (most probable) positive or negative examples are selected to fill the labeled sample set. Naive Bayes classifier is less accurate because in this classifier all features are considered independent which is not usually true in real cases. It is not effective in more complex classifications where all features are not independent of each other. Moreover, complexity due to high-dimensional characteristics of hyperspectral data and due to availability of noise in data sources, naive Bayes classifier is not suitable. In our algorithm, we decided to use the probabilistic output form of SVM to solve the problem of probability classification. SVM have been widely used in the classification of pattern recognition problems and can usually achieve high classification accuracy. Previous studies have also proved the effectiveness of SVM in pixel-based hyperspectral data classification [18]. The standard SVM cannot give the probability output of class markers. SVM is extended to estimate the probability of class. The technical details are briefly described below.
Given class data, for a sample , we aim to estimate the following equation:
SVM was initially proposed for the problem of binary classification. In the multiclass classification problem, if the implementation method of one against one is adopted, the probability between pairs of classes is as follows:
Therefore, the category probability can only be obtained from all of these . For the following optimization problems,
The optimization function comes from the equation
And can be further expressed as
Here,
This problem is convex, and the detailed calculation process can be referred to [12]. So we use the special form above to classify and select the most trusted predictions based on the probability output.
2.4. STF-CT Algorithm
The SVM-based spectral and texture feature cotraining (STF-CT) algorithm is as follows.
|
As described in the algorithm, the STF-CT algorithm is very simple. For the hyperspectral image cube, the first 6 bands of PCA transformation of the cube that can represent the best spatial change information are taken. The texture features given to Gabor filter are extracted from the single band image to form the image texture feature cube. Hyperspectral image cube and texture cube constitute two independent views for constructing cooperative learning algorithm. For a class ground object classification problem, spectral feature classifier and texture feature classifier are trained. In each iteration of the algorithm, the most “reliable” samples considered by the current two independent view classifiers are selected from the “connected” dataset , and these samples are added to the labeled sample set 3, and then, randomly selected samples from the unlabeled sample set are added to the “connected” dataset . The total number of iterations is determined by the parameter .
2.5. Hyperspectral Image Classification Algorithm Based on Sparse Feature and Neighborhood Homogeneity (SF-NH)
Hyperspectral data has a large amount of data, a lot of redundancy, a high dimension, and a strong correlation between the bands, which brings challenges to the subsequent processing. More and more attention has been paid to the problem of how to effectively utilize rich spectral information to ensure processing accuracy. The classification process plays an important role in hyperspectral image processing. It is a key technology to fully describe information and achieve the better classification effect.
Sparse representation is a great technique for pixel-wise classification. It has shown great potential in the field of data description and has gained great benefits in the field of image processing both theoretically and practically. When the data is projected onto the dictionary elements to form the feature subspace, it can be found that only a small part of the data is in the active state, that is, the value is nonzero, and most of the projected data values are 0, so sparse expression of the data can be realized.
Sparse representation has been successfully introduced into the analysis and application of hyperspectral images in recent years. Iordache’s team took the lead in introducing the sparse model into hyperspectral unmixing and proposed a series of representative algorithms, which achieved good results. Sparse models are easily interpretable by a human. In these models, most of the information is represented by linear combinations of elementary samples’ small sets. In this context, hyperspectral pixels in a high-dimensional space are represented by low-dimension subspace which is structured by the same class’s elementary samples. Wang et al. introduced the sparse representation of images into visualization. Sparse representation is widely used in target detection and image restoration. Of course, quite good results have also been achieved in the classification problem. Haq et al. used sparse expression to solve the hyperspectral classification problem with only a few tags. Through experiments, they found that this method could improve the accuracy of the classification and maintain relatively efficient operation efficiency. Song et al. proposed a classifier based on sparse coding, which showed the advantages of sparse coding in feature expression. At the same time, many studies focused on optimizing dictionaries in hyperspectral image classification. For example, Yang et al. constructed some neural networks for dictionary learning. The dictionary dimension was reduced through the compression sampling step, hyperspectral vectors could be sparsely expressed, and accurate classification results were obtained. However, the common shortcoming of the above classifiers is that they are designed with per-pixel classifiers in nature, which only use spectral information and ignore the neighborhood information in the image. Therefore, it can be concluded that a general problem of per-pixel classifier is that it has salt-and-pepper noise mode, that is, there are often some scattered misclassified pixels in the classification result graph, which are similar to salt-and-pepper noise in visual effect.
3. Experimental Simulation Results and Analysis
In this section, we discussed the influence parameters and compared the different classification methods for classification experiment of the Indian Pines, Salinas, and Pavia University datasets. Then, we analyzed the result of our proposed method over other methods.
3.1. Analysis of the Influence of Parameters
In this part, the influence of parameters in SF-NH method on classification performance is studied. Here, we only discuss the influence of a specific parameter on the proposed algorithm under the optimal state of other parameters. Indexes overall accuracy (OA), average accuracy (AA), and Kappa coefficients were used to measure the influence of different parameters on the method presented in this paper. What needs to be explained here is that two different scales are used in the application of neighborhood homogeneity determination method: (1) single scale, one of the 2 and 3 scales is used, respectively; (2) multiscale: use scale, that is, continue processing with scale 3 on the processing result of scale 2.
3.2. Comparison of Different Classification Methods
In this part, the samples mentioned above are used as experimental data for classification simulation. The SF-NH method was compared with classification methods based on original spectral feature (OF) [158] and principal component analysis (PCA) feature [8] (using the first three principal component components for analysis). These methods are representative feature classification methods widely used at present. In all the classification methods, both single scale and multiscale are used in neighborhood homogeneity determination. The control parameters of these comparison methods are adjusted to the optimal through cross-validation. Three commonly used datasets (Indian Pines, Salinas, and Pavia University datasets) are considered for performing experiment to evaluate our classification method. The Indian Pines dataset is taken by the visible infrared imaging spectrometer (AVIRIS) over the Indian pines test site which has 224 spectral bands in the range of 0.4-2.5 μm. Pavia University dataset is taken by Reflective Optics System Imaging Spectrometer (ROSIS) which has 103 spectral bands in the range of 0.43-0.86 μm. Salinas is taken by AVIRIS over the vegetation and soil areas of Salina valley which has 224 spectral bands. Classification results of Indian Pines dataset are shown in Figure 2.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)
For Indian Pines and Salinas data, the classification accuracy of each classifier was averaged over 30 runs in order to reduce bias due to random sampling. For Pavia University data, we adopted a fixed set of training samples. The classification renderings corresponding to one run are shown in Figures 2–4.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

(j)

(k)

(l)
From the first column in Figures 2–4, it can be seen intuitively that, compared with the features extracted by using original spectral features and PCA, the classification results obtained by using sparse features as classification features are better. This is due to the following two reasons: sparse features and their corresponding overcomplete dictionary can fully express hyperspectral data; on the other hand, nonzero elements in sparse features carry very important information; at the same time, sparse features contain a small number of nonzero elements, which are very helpful for classification. Through the comparison between each row and column in the figure, it can be seen that the neighborhood homogeneity determination method can eliminate the “salt-and-pepper noise pixels” generated in the classification process and optimize the classification results after obtaining SVM classification results. At the same time, by comparing the three subgraphs (j), (k), and (L) in each graph, it can be seen that the classification effect of (L) graph is the best, that is to say, the classification result obtained by using multiscale neighborhood homogeneity judgment in SF-NH is better than the classification result of any single scale in multiscale size. Quantitative assessment of Salinas is shown in Table 1.
Tables 1 and 2 more intuitively compare the classification performance of various algorithms through which useful information can be obtained. The problems reflected in the three tables are basically the same. Table 1 is used for detailed introduction. Firstly, by observing OF, PCA, and SF, it can be seen that OA of the algorithm using sparse features as classification features reaches 84.5%. The overall effect is better than that of the original spectral feature () and PCA feature (). At the same time, compared with () and SF (), it can be seen that the accuracy OF classification can be significantly improved by using neighborhood homogeneity determination method, which is also applicable to the classification features OF, SF, and PCA. Among several comparison methods, the classification effect of , which uses sparse features and multiscale homogeneity determination, is the best, and its classification accuracy can reach 91.61%.
Here, we notice that the three datasets are collected by two different types of spectral collectors, and the classification effect is improved to varying degrees through the algorithm, indicating that the algorithm has certain universality. The Indian Pines data acquired by the airborne/visible infrared imaging spectrometer (AVIRIS) sensor had lower spatial resolution than Salinas’s data, and the neighborhood homogeneity algorithm improved the Indian Pines data with lower spatial resolution significantly. For Indian Pines, OA increased by about 6% from 84.51% to 91.61%, while for Salinas, OA increased by about 1.1% from 90.45% to 91.57%. This is because hyperspectral images with low spatial resolution contain less segmental information, and the classification effect can be significantly improved by introducing spatial neighborhood information. At the same time, although the overall classification accuracy was improved by introducing spatial information for Pavia University, the classification accuracy of category 7 and category 8 decreased to varying degrees. Similarly, a similar phenomenon occurred in Indian Pines category 14. CA14 was reduced from 98.34% to 94.54% by introducing the neighborhood homogeneity determination method. It can also be shown from the side by observing the place where most of the misclassified pixels are at the edge of details. The neighborhood homogeneity determination method introduced is more suitable for processing large homogeneous regions, but not for pixel processing of scattered or detailed edges.
4. Conclusion
With the continuous improvement of spectral resolution and spatial resolution, hyperspectral remote sensing technology can provide more and more detailed description of the observed object. Hyperspectral data contains abundant information, which makes the hyperspectral remote sensing technology applied in the military and national defense fields and various aspects of civil life in all countries around the world, and shows great application value. In this paper, we proposed a classification algorithm based on cooperative learning of spectral features and spatial texture features of hyperspectral images with low spatial resolution for ground object classification in hyperspectral images. Our proposed algorithm provides good classification accuracy and effect for the small training sample set and gives a new enlightenment to the problem of hyperspectral object classification based on pixels. In the future work, our aim is to further explore hyperspectral images with high spatial resolution focusing on more effective texture features of images which are suitable for high-spectral cooperative learning classification.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflict of interest.
Acknowledgments
The paper was supported by Research and Practice of Changde 2020 Science and Technology Innovation Development Project 2020[12]203 integrated diagnosis system for health status of operating transformers.