Abstract

In this paper, a second-order finite-difference method is proposed for finding the second-order stationary point of derivative-free nonconvex unconstrained optimization problems. The forward-difference or the central-difference technique is used to approximate the gradient and Hessian matrix of objective function, respectively. The traditional trust-region framework is used, and we minimize the approximation trust region subproblem to obtain the search direction. The global convergence of the algorithm is given without the fully quadratic assumption. Numerical results show the effectiveness of the algorithm using the forward-difference and central-difference approximations.

1. Introduction

1.1. Problem Description Motivation

In this paper, we propose a trust region method for solving the derivative-free nonconvex optimization problems of the following:where is a general nonlinear nonconvex twice Lipschitz continuously differentiable function, but none of its first-order or second-order derivative is explicitly available. The problem arises frequently in computational science, engineering, and industry.

The problem was widely applied in machine learning, but the first- and second-order derivatives were not available in computational science and engineering. Nesterov and Polyak [1] introduced the cubic regularization algorithm generating the iteration point converging to the second-order stationary point of smooth optimization problems. The adaptive regularization cubic method was introduced by Cartis, Gould, and Toint [2, 3]. They built a local cubic model and solved the model for obtaining an approximation solution in each iteration. The complexity rates are also given in this paper. Furthermore, Curtis, Robinson, and Samadi [4] proved that the worst-case complexity of the adaptive trust-region algorithm was and when it achieved the -first-order stationary point and the -second-order stationary point, respectively. They also introduced a new updating rule for obtaining the trust-region radius. Curtis and Robinson [5] proposed a new algorithm which obtained the search direction and the step size for the dynamic choice technique. The algorithm chose a better direction from first-order and second-order descent directions such that the objective function has more significant reduction. Lee, Simchowitz, Jordan, and Recht [6] also proposed the vanilla gradient descent method for finding the strict saddle points with the probability one. Du, Jin, Lee, Jordan, Singh, and Poczos [7] proved this method converged to the second-order stationary point in an exponential number of steps.

In recent years, some authors study the finite-difference methods for solving the nonsmooth optimization, smooth stochastic convex optimization, and noisy derivative-free optimization. Auslender and Teboulle [8] identified a simple line search mechanism using the interior gradient and proximal methods such that the method has the global convergence results. They also showed that the rate of the convergence of the algorithm is . Tseng and Yun [9] introduced the block coordinate gradient descent method for solving the smooth and separable convex functions combinatorial problem. The method generated a search direction by block coordinate descent and ensured sufficient descent and reiterated by the inexact line search. The local Lipschitzian error bound assumption was used for proving global and linear convergence of the algorithm. Auslender and Teboulle [10] solved the derivative-free convex optimization problems by the subgradient projection method. The non-Euclidean projection-like maps were proposed for obtaining the interior trajectories which relied on a single projection. Villa, Salzo, Baldassarre, and Verri [11] proposed an accelerated forward-backward splitting method for solving the composite optimization problem. Under a sufficiently fast decay condition, they proved that the method achieved the convergence rate. Furthermore, they accounted the cost of the algorithm to achieve an approximate optimal solution and gave global complexity analysis of the method. Keskar and Wachter [12] proposed the limited-memory quasi-Newton algorithm which contained the L-BFGS quasi-Newton approximation and the weak Wolfe line search technique. For avoiding the shortsightedness which the gradient was not obtained for a nonsmooth function, the -minimum norm subgradient was introduced to obtain the search directions in iterative corrective loop. Lewis and Overton [13] introduced a quasi-Newton algorithm to obtain the optimal solution of nonsmooth optimization problems. The analysis of the case of the Euclidean norm was given, which used the inexact line search in one variable and assumed that the line search was exact in two variables. The Clarke stationary point of the problem was found by the BFGS method with the inexact line search. At the same time, they also showed that the algorithm has R-linear convergence.

Berahas et al. [14] minimized noisy functions by presenting the finite-difference quasi-Newton method. The noise estimation techniques were given by More and Wild [15]. They presented a new EC noise algorithm for solving noise-level calculation functions. The convergence analysis of the algorithm was based on stochastic noise without the assumption of specific distribution for the noise. They also used the search direction to update the finite parameter when the line search technique was invalid for producing an acceptable point such that the method converged to the optimal solution faster. Hamming [16] used BFGS update for obtaining differencing intervals. Brekelmans et al. [17] analyzed different gradient estimate methods for noisy functions. Through estimating the different gradient of error criterion, they converted the total error into the sum of deterministic error and stochastic error. They derived optimal step sizes for deterministic errors and stochastic errors, respectively, such that the total error was minimized. Beraha et al. [18] analyzed finite differences, linear interpolation, Gaussian smoothing, and smoothing on a unit sphere which only used the function values for approximating the gradient of the objective function.

Nesterov and Spokoiny [19] proposed the method in which the search directions were normally distributed random Gaussian vectors. They proved new complexity bounds of the methods based on computation of the function value. They also proved the conclusion that the methods need at most times iterations, where was the dimension of problem. An accelerated scheme with the expected rate and a zero-order scheme with the expected rate for stochastic optimization were given. Gorbunov et al. [20] proposed an accelerated derivative-free algorithm for unconstrained smooth convex functions with noisy. The noise consisted of the combination of stochastic nature and unknown nature cases with absolute value bounded constraints. They also proposed a nonaccelerated derivative-free method similar to the stochastic-gradient-based method and proved an -norm proximal setup has better complexity bound than the Euclidean proximal setup. Bellavia et al. [21] presented evaluation complexity bounds in the framework of a deterministic trust-region method. They also showed that the presence of intrinsic noise might dominate the bound and provided estimates of the optimality level achievable, should noise cause early termination. Finally, they shed some light on the impact of inexact computer arithmetic on evaluation complexity.

1.2. Contribution of This Paper

In order to solve the derivative-free nonconvex unconstrained optimization problem in which the first-order or second-order derivative cannot be explicitly available, the new trust-region method is introduced in this paper. The main contributions of this paper are as follows:(i)The finite-difference trust-region method is proposed for finding second-order stationary points of derivative-free nonconvex optimization. We prove that the method globally converges to a second-order stationary point of the derivative-free nonconvex optimization problem .(ii)The proposed algorithm uses the forward-difference or central-difference techniques approximating the gradients and Hessian matrix of objective function to build the trust-region subproblem. Hence, the trust-region subproblem of the algorithm is more accurate than the BFGS update. The algorithm can obtain better search directions.(iii)The new finite-difference parameter update technique ensures that the algorithm generates a subsequence converging to the second-order stationary point of original nonconvex optimization problem without the fully quadratic approximation.

This paper is organized into 6 sections: The finite-difference trust-region subproblems and the definition of second-order stationary points are introduced in Section 2. The finite difference trust-region algorithm is introduced in Section 3. We prove that there is a sequence generated by the algorithm converging to the second-order stationary point of problem under the bounded approximation of the level set in Section 4. Numerical experiments illustrating the practical performance of the algorithm are reported in Section 5. The final conclusion of this paper is given in Section 6.

1.3. Notation

In this paper, the derivative-free nonconvex unconstrained optimization problem are solving by the trust-region algorithm with forward-difference or central-difference technique. denotes the variables of problem , and is the optimal solution of objective function . denotes the Euclidean norm. and denote the gradient and the Hessian matrix of any scalar function on iteration point , respectively.

2. The Trust-Region Subproblem and the Definition of Second-Order Stationary Point

In this paper, we propose the finite-difference trust-region method solving derivative-free nonconvex optimization problem . The trust-region subproblem needs the information of gradient and the Hessian matrix of objective function . But this information is not obtained. Hence, we will use the finite-difference method approximating this information. The finite-difference method relies on the computation of the finite-difference parameter .

We applied the definition of the finite-difference method given in [15]. We define the th component of the forward-difference approximate at as follows:where is the -dimensional unit vector in which th element is 1.

The central-difference approximate is given as

In the general finite-difference algorithm, the approximation of the Hessian matrix does not use the information of the original objective function of problem , so the calculation error is large. In order to reduce the error between the symmetric matrix of the trust-region subproblem and the Hessian matrix of the original problem, similar to estimating the gradient of , the second-order forward-difference technique by the objective function is expressed as follows:and the second-order central difference approximate is given bywhere is the element in row and column of matrix and and are -dimensional unit vectors whose th and th elements are 1, respectively.

For obtaining the second-order stationary point of problem , we will give the approximation trust-region subproblem of problem as follows:where and , generate by (2) or (3) and generate by (4) or (5). is the minimum eigenvalue of symmetric matrix and .

Before introducing the finite-difference trust-region algorithm, we introduce the definition of the second-order stationary point of problem .

Definition 1. For derivative-free nonconvex optimization problem , if there exists a point , for all vector , such thatthen is said to be second-order stationary point.

According to Definition 1, if let , then it is obviously that is the second-order stationary point of problem according to Definition 1.

3. Algorithm

In this section, a trust-region method is introduced for finding the approximation of the second-order stationary point of problem .

4. Convergence Analysis

In this section, we will provide the global convergence analysis of Algorithm 1; hence, we require the model to satisfy the following assumption.

Input: Given initial iteration , the constants , , , and . Set .
Main Step:
(1)Choose , calculate the approximate gradient using (3) or (2). Choose , calculate the approximate Hessian matrix using (5) or (4) and calculate , where is the minimum eigenvalue of symmetric matrix .
(2)If , then let . If , then stop, is the second-order stationary point, else go to 1.
(3)If , then calculate the following subproblem,
    
where and obtain the search direction .
(4)Calculate
    
(5)If then , else let .
(6)Update the trust-region radius as follows:
    
Let , go to 1.

Assumption 2. We define the level set as follows:Suppose that the level set is bounded.

As an important result, the following lemma shows the boundedness of oneself, the gradient, and the Hessian matrix of the objective function.

Lemma 3. Under Assumption 2, for , there exist positive constants and such that

Proof. Similar proof can be found in Lemma 3.2 of [22].
The boundedness of and is important for the convergence of Algorithm 1; hence, we will prove the boundedness of the approximation gradient of objective function and the approximation Hessian matrix of objective function on iteration point in the following lemma.

Lemma 4. Under Assumption 2, there exist and such that

Proof. Similar proof can be found in [23].
The following lemma shows the error is bounded between and the finite-difference gradient and the error is bounded between the finite-difference Hessian matrix of the objective function on the iteration point generated by Algorithm 1.

Lemma 5. Under Assumption 2, for any iteration point generated by Algorithm 1, we have thatwhere denotes the dimension of problem and is the Lipchitz constant of .

Proof. Similar proof can be found in [23].
According to Lemma 5 and the method of selecting in step 1 of Algorithm 1, we haveIn order to avoid the first-order and second-order descent assumptions in [24], we demonstrate that the search direction generated by our Algorithm 1 satisfies first-order and second-order descent.

Lemma 6. Under Assumption 2, there exists constant such thatwhere is the minimum eigenvalue of matrix .

Proof. If is obtained by (5), thenThe proof of (15) can be found in [25].
Because , by (16), we have thatIf is obtained by (5), then we have that there exists such thathence, we can obtain thatCombining (16) and (18), we have thatIn order to establish the global convergence of Algorithm 1, we first introduce the following notations,The following lemma shows that the error is bounded between the smallest eigenvalues of Hessian matrix of objective function and a corresponding finite-difference model.

Lemma 7. Under Assumption 2, if is generated by (4) or (5), then

Proof. Let be a normalized eigenvector of smallest eigenvalue of matrix , thenAndthen the result follows.
According to step 1 of Algorithm 1, we have thatAfter the error is bounded between and , the error boundedness of and is given in the following lemma.

Lemma 8. Under Assumption 2, there exists such that

Proof. Similar proof can be found in Lemma 7.2 of [24].
We now show that an iteration must be successful for sufficiently small trust-region radius with respect to .

Lemma 9. If is generated by (2) or (3) and is generated by (4) or (5) andwhere , then is a successful iteration.

Proof. First, according to the fractions of Cauchy and eigenstep decrease,There are two cases for : either or .
If , thenHence, from (11), (12), and (28), we have thatIf , thenSimilar to the proof of (29), we deduce from (11), (12), and (30) thatCombining (29) and (31), we have that is a successful iteration.

Lemma 10. Under Assumption 2, if for constants , we have for all . Then,for all , where is a constant.

Proof. By Lemma 9 and for all , if satisfies the following conditionthen th iteration must be successful, and from step 6 of Algorithm 1, we have ; the conclusion holds.
The following lemma shows that Algorithm 1 can generate a second-order stationary point under finitely successful iterations.

Lemma 11. If the number of successful iterations is finite, then

Proof. Because there is a finite number of successful iteration, then there is an infinite unsuccessful iteration that such that the trust region is reduced. Hence, is decreased and converges to zero.
Let us consider thatIf for a subsequence, then for sufficiently small , the index is successful, which yields a contradiction. Hence, we have

Lemma 12. Under Assumption 2, if the sequence is generated by Algorithm 1, then

Proof. Similar proof can be found in Lemma 7.7 of [24].
Next, we give the important conclusion as a corollary.

Lemma 13. Under Assumption 2, we have that

Proof. First, we assume that there exists such thatfor all . By Lemma 10, we have that for all . This is a contradiction with Lemma 11.
We prove that there exists a subsequence generated by Algorithm 1 convergent to a second-order stationary point of problem .

Lemma 14. Under Assumption 2, iffor any subsequence holds, then

Proof. By (40), we have that for large enough , and then, by Lemma 12, we have as . By (19), we have thatCombining (40) and (42), we have that (41) holds.
According to Lemmas 13 and 14, we can obtain the following global convergence of Algorithm 1.

Theorem 15. Under Assumption 2, we obtain that

Theorem 15 shows that Algorithm 1 generates a subsequence converging to second-order critical of problem . Next, we prove that the total sequence generated by Algorithm 1 converges to second-order critical of problem .

Theorem 16. Under Assumption 2, we can obtain that

Proof. Similar proof can be found in Theorem 7.11 of [24].

5. Numerical Results

In this section, in order to test the efficiency of our algorithm, we choose 49 unconstrained optimization test problems from [22]. The dimension and name of test problems are reported in Table 1. Before solving the problems in Table 1, we will introduce the parameters selected in the actual calculation of the algorithm proposed in this paper:

In order to solve the test problems in Table 1 by using the algorithm in this paper, we use MATLAB (2014a) to write the computer program, and the computer is HP (CPU is i7-8700, main frequency is 3.2 Hz, and memory is 16 G). The termination accuracy of the algorithm is .

In order to draw a comparison diagram of the results of the algorithms, we use the performance comparison formula of the algorithm proposed by Dolan and More [27] to calculate the computational efficiency between different algorithms. Here are the specific formulas:where denotes the set of algorithms. Let denotes problem sets, and . For , letwhere represents the efficiency of each solver.

In order to further test the effectiveness of our algorithm, we will select some derivative-free optimization problems for testing and calculate these problems using existed derivative-free algorithms. The calculation results are recorded in Figure 1. We will first introduce the tested problems. We list all the test problems in Table 1, which shows the name of the problem, dimension , and the source of the problems. From Table 1, we can see that all test problems are the nonsmooth problems, and their derivatives are not obtained directly. Hence, it is more appropriate to use our algorithm to solve the problems in Table 1.

In Table 1, many objective functions are given in the form of max, and in the actual calculation process, we equivalently rewrite the objective function into the form of a single objective function. If the objective function is , then we can obtain the equivalent single objective function as

If there are more than two functions in the objective function of the original problem compared in size, we will repeatedly apply (48) to obtain a single objective function. For example, if the objective function is , we can obtain the single objective function as follows:

The case where more functions are included is similar to the calculation in (49), which can result in a single objective function form. It is obvious that (48) and (49) are also nonsmooth, and their derivatives cannot be obtained directly.

Our algorithm mainly provides a dynamic method for adjusting the difference coefficient, so that in the actual calculation process, our algorithm does not use the traditional fixed difference coefficient calculation method but dynamically adjusts the difference coefficient based on the actual descent of the objective function. This approach can avoid the problem of fixed coefficients causing slow descent of the objective function, especially to overcome the problem of poor robustness caused by fixed difference coefficients in the algorithm.

DEO-TRNS and NOMAD are both famous derivative-free algorithms used to solve nonsmooth black-box optimization problems. They are based on the trust-region framework and construct derivative-free algorithms. They obtain the search direction by solving the approximate trust region subproblem and update the trust region radius by the actual degree of the descent of the objective function to obtain the optimal solution of the problem. By using our algorithm, DEO-TRNS, and NOMAD to calculate the problems in Table 1, we obtained the corresponding calculation results and recorded the results in Figure 1.

According to (46) and (47), from the subgraphs in the upper left and right corners of Figure 1, it can be seen that our algorithm can effectively solve the problem in Table 1 by approximating the derivative of the problem using either central-difference or forward-difference techniques. The main manifestation is that at , the red curve representing our algorithm using forward-difference has achieved 0.9, but the blue curve representing DEO-TRNS and the green curve representing NOMAD only achieved 0.18 and 0.17, respectively. When using the central-difference technique, the red curve representing our algorithm has achieved 0.95, but the blue curve representing DEO-TRNS and the green curve representing NOMAD only achieved 0.22 and 0.19. The red curve representing our algorithm is much higher than the blue curve representing DEO-TRNS and the green curve representing NOMAD. Moreover, the speed at which the red curve tends to 1 is generally higher than that of the blue and green curves. This means that our algorithm can solve most problems with minimal iteration costs. From the subgraphs in the lower left and right corners of Figure 1, we can also see that both red curves have achieved 0.95 when , but the values corresponding to the blue and green curves are much lower than those corresponding to the red curve. The speed at which the red curve tends to 1 is generally higher than that of the blue and green curves. From the values in the abscissa, it can be seen that our algorithm spends far fewer CPU time solving the problems in Table 1 than DEO-TRNS and NOMAD. This indicates that our algorithm is very effective in solving the actual problems in Table 1.

6. Concluding Remarks

This paper proposes a trust-region method with forward-difference or central-difference approximation techniques for obtaining the second-order stationary points of derivative-free nonconvex optimization problem . The finite-difference technique is used to approximate the gradient and Hessian matrix of the objective function. The search direction is obtained by solving the trust-region subproblem. We prove global convergence of the algorithm proposed by this paper without the fully quadratic approximation; i.e., the algorithm generates a sequence converging to the second-order stationary point of the problem . In the numerical calculation section, first, we compute 49 test problems using our algorithm, NOMAD, and DEO-TRNS, respectively. The results show that our algorithm spent fewer iterations and CPU time than other algorithms. We also compute 46 test problems using forward-difference and central-difference techniques. The results of comparing iterations and CPU time show that the central-difference approximation uses smaller iteration than the forward-difference approximation in solving most problems.

Data Availability

The data supporting this meta-analysis are from previously reported studies and datasets, which have been cited.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The author thanks the support of the National Natural Science Foundation (11371253), the Hainan Natural Science Foundation (120MS029) and and Hainan Provincial Natural Science Foundation (120MS028).