Abstract

Hyperspectral unmixing is a powerful method of the remote sensing image mining that identifies the constituent materials and estimates the corresponding fractions from the mixture. We consider the application of nonnegative matrix factorization (NMF) for the mining and analysis of spectral data. In this paper, we develop two effective active set type NMF algorithms for hyperspectral unmixing. Because the factor matrices used in unmixing have sparse features, the active set strategy helps reduce the computational cost. These active set type algorithms for NMF is based on an alternating nonnegative constrained least squares (ANLS) and achieve a quadratic convergence rate under the reasonable assumptions. Finally, numerical tests demonstrate that these algorithms work well and that the function values decrease faster than those obtained with other algorithms.

1. Introduction

Hyperspectral data are acquired by high-resolution imaging sensors that record hundreds of continuous narrow spectral band images. The goals of hyperspectral imaging are to obtain the spectrum for each pixel in the image of the scene and identify materials or process objects. Because the spatial pixel size obtained by hyperspectral sensors is often large, more than one material can contribute to the spectrum measured from a single pixel, leading to the existence of mixed pixels. To improve the accuracy of the material classification and recognition process, hyperspectral unmixing (HU) is necessary. Certain objects leave unique fingerprints in the electromagnetic spectrum, and these spectral signatures enable the identification of the materials that compose a scanned pixel. HU aims at identifying materials (endmembers) present in a captured image and their corresponding fractional abundances using high spectral resolution of hyperspectral images.

In the past few decades, researchers have proposed many algorithms for HU based on the linear mixing model. When the endmember matrix is known a priori, one can effectively recover the fractional abundances. Recently, nonnegative matrix factorization (NMF), which is consistent with the linear mixing model, has been proposed as a powerful tool for identifying the components of a certain dataset. In fact, the nonnegative constraints of NMF support the physical meaning of HU, and from a mathematical perspective, NMF divides the original matrix into two matrices. The first matrix contains the basis vectors, which approximate each column of the original matrix with the combination coefficient of a certain column of the second matrix. Based on the above description, NMF provides a new approach to HU [1].

In recent years, many methods have been proposed to solve the NMF problem. Berry et al. [2] divided NMF algorithms into three general classes: multiplicative updating algorithms, gradient descent algorithms, and alternating least squares algorithms. The prototypical multiplicative updating (MU) rules proposed by Lee and Seung [3] have been widely used as the baseline in relevant studies. Another widely used method of solving NMF problems is the projected gradient method [4]. With additional projections, the nonnegativity constraint is satisfied. Most projected gradient methods provide strong optimization properties [58]. Third, the alternating least squares (ALS) algorithm [2, 9], which alternatively fixes one matrix and improves the other with a simple least squares computation, is commonly applied. All the negative elements are set to 0 to meet the nonnegativity constraint.

To produce useful decompositions, some constraints on the scales of U or V are needed. Hoyer proposed the nonnegative sparse coding in [10, 11], and NMF with sparse constraint had been applied to unmix hyperspectral data [1216]. Cai et al. presented graph-regularized nonnegative matrix factorization (GNMF) [17, 18]. GNMF combined with sparseness constraint could preserve the geometrical structure of the data and represent it in low dimensional space when applying in HU [19]. Also, orthogonal matrix factorization [20] and minimum volume-constrained NMF [21] have been put forward to enhance the decomposition results. Recently, multilayer NMF [22, 23] and deep NMF [24] complete the matrix decomposition in different layers, and they reduce the reconstruction error efficiently. In [25], an algorithm based on a clustered multitask network was proposed, which considered sparsity, clustering, and neighborhood information. Extension of different regularized NMF models has received significant attention in data mining [23, 2634]. Most of these regularized NMF models are solved by the MU method because it is easy to implement and often yields good results. However, the MU method converges slowly due to a first-order convergence rate. To accelerate the convergence rate, some (quasi-)Newton type methods have been applied to solve NMF [3537].

In this paper, we propose an active set type Newton method to solve nonnegative least squares problems. Because the combination coefficients in the abundance matrix are generally sparse, active set type algorithms are a good choice for solving NMF problems [3740]. In [4143], active set identification functions are employed to predict the constraints that are active at the solution. After the estimated active set is determined, the variables in the active set become active variables, and the rest of the variables are inactive variables. Finally, the direction in the subspace spanned by inactive variables is determined. These active set type algorithms are efficient in solving large-scale optimization problems.

This paper is organized as follows. In Section 2, we provide the basic definitions of important terms and the two active set identification techniques. In Section 3, we present the proposed active set type Newton method, which is based on the alternative nonnegative least squares approach. The convergence properties are also discussed. The proposed algorithms are compared to other commonly used NMF algorithms in Section 4.

We use the following notation in this paper. A superscript k is used to indicate the iteration number. If H is a matrix with elements , the double subscript indicates the element is in the i-th row and j-th column of H. Let I be an index set, such that . We denote as the submatrix of H, where . If ν is an -element vector, we denote as the subvector with components where . Some basic definitions and assumptions are stated in the next section.

2. Problem Formulation and Active Set Strategy

2.1. Nonnegative Matrix Factorization Problem Formulation

In general, NMF can be mathematically formulated as follows. Given an input matrix , in which each element of W is nonnegative, and a integer , NMF aims to find the reduced rank nonnegative matrices and , such that

We note that the spectral signature of each pixel of the hyperspectral image can be obtained from the combination of the nonnegative spectral signatures of the constitutive endmembers. In such a case, U is often considered the endmember matrix and V is the fractional matrix [44]. These results imply that endmember extraction and abundance maps creation can be simultaneously accomplished through NMF.

There are many methods measuring the accuracy of approximation given by (1). In this paper, we focus on the most common choice, which is the Frobenius norm. We solve the following minimization problem to find U and V:where the inequalities and indicate that all the elements of U and V are nonnegative.

By alternatively fixing one matrix and improving the other, the two-variable minimization problem (2) can be converted into a alternative nonnegative least squares (ANLS) minimization subproblems as follows:

Because ANLS2 easily transformed to ANLS1 with the transposition of matrices , and V and, when solving these equations, we are more accustomed to the form of rather than , we focus on solving ANLS1. We use ANLS uniformly, to correspond to ANLS1.

In HU, U represents the endmember matrix, the columns of which contain the spectra of the endmembers. After U is given, each row of the solution V provides the distribution of each endmember. An efficient solution for ANLS is the focus of the present work. Methods that rigorously solve for ANLS might add to the computational burden, neglect the nonnegative constraints, and solve the corresponding unconstrained least square problems; then, any negative values are simply set to zeros, which may yield poor estimates that are not the truly least squares [45, 46].

2.2. Basic Definition and Assumption

In fact, the ANLS can be treated as a large-scale bound-constrained optimization problem, and we present three active set type optimization algorithms for solving ANLS.

The gradient of the objective function of the ANLS [3] is

From the KKT optimality conditions in [3], is a stationary point of problem (2), if the multipliers and exist such thatwhere denotes the Hadamard product, which takes two matrices of the same dimensions and produces another matrix in which each element is the product of the elements of the original two matrices, i.e., .

2.3. Two Active Set Identification Techniques

A constraint of the form is said to be active at the optimal solution , if strict equality holds at , i.e., , and inactive at , if strict inequality holds. For any feasible solution of the ANLS, the current iteration V is classified into two parts: active variables and inactive variables.

To facilitate the discussion, we concentrate on a vector of the columns of one matrix and use instead of V. Let be the index set that contains the estimated indices of the active variables. Then, the active set and the remaining index of is denoted by .

We present two active set identification techniques that help estimate the active set of the optimal solution of the ANLS problem. The active set strategy will improve the accuracy of the ANLS solution, and the computational cost is reduced.(i)Identify the active set with a constantWith a small positive constant ζ, we obtain an estimate of the active set:(ii)Identify the active set with the multiplierFrom the first equality of the KKT condition, we obtain the approximate multiplier:

Then, we have the second active set identification function, that is,where ε is a small positive constant.

We use and in the following equations. At iteration k, the variables of with indices in are called active variables, and the remaining variables are called inactive variables.

A nonnegative matrix is said to be a stationary point of the ANLS, if for every ,where and . Strict complementary condition holds for , if the strict inequalities hold in the first constraint of (9).

3. Active Set Type Newton Algorithm for ANLS

The columns of V are used to generate a vector , and the ANLS can be written as follows:where is the trace of a certain matrix [3, 4].

We assume that following standard assumption holds.

Assumption. U is a full-column rank matrix.
This assumption can be easily satisfied because r is a small number. Then, the Hessian matrix of the ANLS isUnder the assumption, is a positive definite matrix. It follows that ANLS is a strictly convex optimization problem and after the Newton method is applied, a second order local convergent rate is obtained. The empirical test of the full-column rank assumption indicated that this assumption is satisfied in the random case (see more details in [37]).

3.1. Active Set Type Algorithm for Solving the ANLS Problem

In this section, we introduce an active set type algorithm to solve the ANLS problem. The Newton method is employed to update the inactive variables, and the active variables are updated in a simple manner, that is, by directly proceeding to the boundary or inside the feasible region.

3.1.1. Search Direction of the Inactive Variables

By extracting the information related to the inactive variables from , we obtainwhere , is the number of elements in , and the column of is the unit vector . If we assume that , the columns of are , where . In this case, is the -th column of the identity matrix.

Lemma 1. Under the above assumption, is a positive definite matrix.

Proof. Because U is full-column rank matrix, is a positive definite matrix and is positive definite.
Then, , and it follows thatConsequently, is a positive definite matrix.
Because is a positive definite matrix, it follows that where andwhere and are the smallest and largest eigenvalues of , respectively.
The search direction of the inactive variables is finally defined in (14). The hyperspectral data are sparse, which indicates that there may be many estimated active variables. As a result, the computational cost is significantly reduced.

3.1.2. Search Direction of the Active Variables

The active variables are updated in a simple way:

The above definition of the search direction will help satisfy the descent condition (for further details, see [4143]), and with , the next iteration is guaranteed to be a feasible point. After the search direction is computed, a step size is chosen with the Armijo rule. At the k-th iteration, we propose Algorithm 1 to solve the ANLS problem.

Input: , r and .
Output: such that .
Step 1 (computation of the search direction)
  Compute the estimated active set with one of the definitions (AS1) or (AS2).
  Compute the search direction of the active and inactive variables with (15) and (16).
Step 2 (update the factor matrix)
  Let where the step size is and is the first nonnegative integer that satisfies the following inequality: .

Theorem 1. Suppose that the assumption and strict complementary slackness assumption are satisfied, the sequence generated by Algorithm 1 converges to the optimal solution and the convergence rate of is at least quadratic.

Proof. Because the Hessian matrix in (10) is positive definite, is strictly convex and the optimization problem in (10) has a unique solution .

With (14) and strict complementary slackness assumption, we obtain from Proposition 4 in [47] that subalgorithm 3.1 converges to the optimal solution and the rate of is at least quadratic.

Now, we can present the new algorithm for NMF (Algorithm 2).

Input: , r, .
Output: The approximation of the two factor matrices U and V.
Step 1 (update the factor matrices alternatively)
  If the convergence criterion is satisfied, then stop; else,
  update with Algorithm 1 and,
  update with Algorithm 1.
Step 2
  , go to Step 1.

Theorem 2. Any limit point in the sequence generated by the AS is a stationary point of (2).

The above convergence result is directly from the corollary in Reference [48].

4. Numerical Tests

In this section, the proposed algorithm with active set identification technique or is marked as or . We compare and with the main stream approaches to solve (2), that is, the multiplicative updating (MU) method and projected gradient (PG) method. We add the experimental results of the projected Newton (PN) method. The PN method uses the projected Newton direction to update the factor matrices in ANLS, while AS1 and AS2 use projected Newton direction to update the inactive variables, and active variables are updated with simple rule based on the information of the gradient. PG and PN and AS1 and AS2 have the similar rules in the stopping criterion and step size updating. Because algorithms are sensitive to the initial point when solving NMF problems, five algorithms start with the same initial random generated matrices and . All these five algorithms use the same parameters related to the stopping criterion, that is,(i)tol: tolerance for a relative stopping criterion, tol = (ii)timelimit: limit of time, timelimit = 5(iii)maxiter: limit of the iterations, maxiter = 

All algorithms are implemented in Matlab; experimental results on synthetic and real-world data are, respectively, presented in Sections 4.1 and 4.2.

The stopping criterion of the proposed algorithms is similar to that adopted by Lin:where the projected gradient is defined as follows:

The stopping tolerance is updated in a similar way as shown in Reference [4].

4.1. Simulation Test 1

We start with generating the original matrix W. The spectral signatures are chosen form the United States Geological Survey (USGS) digital spectral library. Figure 1 shows twelve of these endmember signatures.

The spectral signatures contain 224 spectral reflectance bands in the wavelength range of . The endmember matrix U has twelve columns, representing the twelve spectral signatures, and the abundance matrix V is obtained from Dirichlet distribution of parameter , which satisfies the nonnegative constraints. The sum of coordinates is 1. Let the original matrix . Figure 2 displays the square root of the final objective function value depending on the time. We set the number of columns of W to , and for each W, we plot the objective value vs. time curve.

We perform a quantitative analysis with added noise, that is, , where in Matlab notation (Gaussian distribution of mean zero and standard deviation). Figure 3 shows the square root of the final objective value vs. standard deviation ξ.

Figures 2 and 3 show that the active set type methods perform well. Specifically, the function value curves slowly decrease, and the results of the AS method rapidly decrease, which indicates that if the dataset is well structured, AS should be able to find the solution much more easily than the other methods. Generally speaking, PN should perform better than PG because the second-order information is employed with the Newton step. As far as we are concerned, there are two reasons for the failure of PN: the computation of the inverse matrix is time consuming and the sparsity causes the singularity of the Hessian matrix, especially in the end of the iteration. After the active set strategies are applied in the projected Newton method, the algorithms become more stable and time saving. Because the searching of the step size is a time-consuming operation, the curves of PG, PN, AS1, and AS2 are polygonal lines.

4.2. Tests of Hyperspectral Unmixing

Finally, we evaluate the effectiveness of the active set type method with the real-world data. Urban is one of the most widely used hyperspectral data used in the hyperspectral unmixing study. There are pixels, each of which corresponds to a area. In this image, there are 210 wavelengths ranging from . Due to dense water vapor and atmospheric effects, we remove several channels and keep 162 channels [4951]. There are four endmembers in these data: asphalt road, grass, tree, and roof. Set the number of endmembers ; we factorize the original matrix with MU, PG, PN, AS1, and AS2. In the numerical test, we set timelimit = 15. All of these five algorithms successfully separate the endmember matrix and the abundance matrix. We compare five methods in the spectral angle distance (SAD) and the abundance maps of the 4 endmembers. The SAD is used to compare the similarity of the endmembers. Let be the spectral signature of endmember a and b. Then,

The smaller the SAD, the more similarities in endmembers a and b. We compute the SAD between our unmixed endmember signatures and the ground truth of these four endmembers, whose spectral signatures are extracted from the USGS library.

Table 1 shows that our proposed active set type algorithms are consistently better or comparable to all the other methods. Figure 4 shows the ground truth for the abundance fractions of these endmembers, that is, roof, asphalt, grass, and tree in order from left to right.

In order to give an intuitive comparison of the HU results, we illustrate the abundance maps on Urban data in Figure 5, which is shown in gray scale.

Because the second-order information of the objective function is used in the proposed active set type Newton algorithms, the run time is reduced. Based on the help of the active set strategy, AS1 and AS2 are more stable than PG and PN. When dealing with hyperspectral data, we found that PG and PN are sensitive to the initial matrices and and the values of the parameters in the algorithm. Additionally, PG might stop because the stopping condition of the ANLS subproblem is satisfied, which indicates that the descent condition does not meet under the maximum inner iteration constraint of the inner loop.

4.3. Summary and Discussion

We present two active set type Newton algorithms for solving NMF problems based on an alternating nonnegative constrained least squares approach. Based on an active strategy, the value of the objective function decreases at the beginning of the iterations, which makes the algorithms more robust in various applications. Finally, we make some remarks about the future work. First, it is said that a good initialization matrix can improve the speed and accuracy of the solutions as it produces faster convergence to the improved local minima. A good clustering method which helps the estimated endmember signatures or given the endmember matrix based on the prior ground truth data is worthy of studying. Second, efficient constraints should be added to the NMF model; these help to improve the accuracy of HU. The endmember matrix and abundance matrix both have their own characteristics, such as sparsity of V and column linear independent of U. Third, an optimal solution for the ANLS subproblem corresponding to one factor is not required before the update of the other factor is performed. Generally speaking, the algebra-relevant algorithms, such as MU and ALS, perform simple, while it is at higher computational cost with the algorithms based on the optimization scheme to find an accurate solution of ANLS subproblem. This indicates that the inexact Newton or limited memory quasi-Newton method with certain active set strategy might work.

Data Availability

The data used to support this study are available from http://lesun.weebly.com/hyperspectral-data-set.html.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This study was supported by the National Science Foundation of China (11701337, 41271235, and 11301307), Shandong Provincial (China) Natural Science Foundation (ZR2016DM03), and the Project of Shandong Province Higher Educational Science and Technology Program (J16LI16).