Abstract

In this paper, a derivative-free affine scaling linear programming algorithm based on probabilistic models is considered for solving linear inequality constrainted optimization problems. The proposed algorithm is designed to build probabilistic linear polynomial interpolation models using only n + 1 interpolation points for the objective function and reduce the computation cost for building interpolation function. We build the affine scaling linear programming methods which use probabilistic or random models and affine matrix within a classical linear programming framework. The backtracking line search technique can guarantee monotone descent of the objective function, and by using this technique, the new iterative points are located within the feasible region. Under some reasonable conditions, the global and local fast convergence of the algorithm is shown, and the results of numerical experiments are reported to show the effectiveness of the proposed algorithm.

1. Introduction

1.1. Problem Description Motivation

In this paper, we consider the following nonlinear optimization problem with linear inequality constraints:where is the general nonlinear twice continuously differentiable functions, but none of their first-order or second-order derivatives is explicitly available. is the feasible region where . Moreover, it is assumed that the interior of the feasible region is not empty.

The focus of this paper is the analysis of a numerical scheme that utilizes affine scaling randomized models to minimize nonlinear functions with linear inequality constraints. The motivation comes from the algorithm for minimization of so-call black-box functions where values are computed. For such problems, function evaluations are costly and derivatives are typically unavailable and cannot be approximated. Hence, the derivative-free optimization is developed. Moreover, the list of applications including molecular geometry optimization, circuit design, groundwater community problems, medical image registration, dynamic pricing, and aircraft design [1]is diverse and growing. In recent years, some people developed derivative-free algorithms for solving unconstrained and constrained minimization problems. Kolda et al. [2] presented a new generating set search approach for minimizing functions subject to linear constraints. One of their main contributions is a new condition to define the set of conforming search directions that admits several computational advantages. Zhang and Conn [3] developed a framework for a class of derivative-free trust region algorithms for the least-squares minimization. These algorithms are designed to take advantage of the problem structure by building polynomial interpolation models for each function in the least-squares minimization. Gratton et al. [4] considered an implementation of a recursive model-based active-set trust-region method for solving bound-constrained nonlinear non-convex optimization problems without derivatives using the technique of self-correcting geometry proposed in [5]. Kolda et al. [2] analyzed the oracle complexity of first-order and derivative-free algorithms for smooth non-convex minimization. Billups et al. [6] proposed a derivative free algorithm for optimizing computationally expensive functions with computational error. The algorithm used the trust region regression method which used weighted regression to obtain more accurate model functions at each trust region iteration. Bueno et al. [7] introduced a new method for solving a constrained optimization problem in which the derivatives of the constraints were available but the derivatives of the objective function were not. The method is based on the inexact restoration framework, and the iteration was divided in two phases. The first phase considered improving the feasibility of the constraints, and the second phase minimized a suitable objective function subjective to a linear approximation of the constraints which must be solved using derivative-free methods. Wild and Shoemarker [8] introduced the global convergence of radial basis function trust region algorithms for derivative-free optimization. Their results extend the recent work of Conn et al. [9] to fully linear models that have a nonlinear term. Xue and Sun [10] proposed a derivative-free trust region algorithm for constrained minimization problems with separable structure, where derivatives of the objective function are not available and cannot be directly approximated. They constructed a quadratic interpolation model of the objective function around the current iterate and used the filter technique to ensure the feasibility and optimality of the iterative sequence.

There is a variety of evidence supporting the claim that randomized models can yield both practical and theoretical benefits for deterministic optimization. As an example, the randomized coordinate descent method for large-scale convex deterministic optimization proposed in [11] yields better complexity results than cyclical coordinate descent. Most contemporary randomized methods generate random directions along which all that may be required is some minor level of descent in the objective f. Within direct search, the use of random positive spanning sets has also been recently investigated [12, 13] with gains in performance and convergence theory for nonsmooth problems. Bandeira et al. [14] considered the use of a probabilistic or random model within a classical trust region framework for the optimization of deterministic smooth general nonlinear functions.

1.2. Contribution and Structure of the Paper

In order to solve the linear inequality constrained optimization problem, the derivative-free affine scaling LP algorithm based on probabilistic models is introduced in this paper. Compared with the traditional derivative-free algorithm, the new derivative-free algorithm proposed in this paper has the following contributions:(1)The randomized numerical algorithm is designed in this paper providing accurate enough approximations such that in each iteration, a sufficiently improving step is produced with higher probability than traditional random schemes(2)The polynomial interpolation models are built only using n + 1 interpolation points, which are far less than (n + 1) (n + 2)/2 interpolation points of building the quadratic interpolation function. The computational cost of building the interpolation function using n + 1 interpolation point is less than using (n + 1) (n + 2)/2 interpolation point.(3)The backtracking line search technique is used such that the linear programming subproblem is solved only once for obtaining the search direction, and the total computational effort for completing one iteration is less than the traditional derivative-free algorithm. Meanwhile, this technique also ensures that each iteration point is a feasible interior point.

This paper is organized as follows. In Section 2, we introduce the first-order affine scaling linear programming method based on probabilistic models. The algorithm for solving linear inequality constrained optimization problems is given in Section 3. In Section 4, we give the global convergence of the algorithm. The fast local convergence of the algorithm is given in Section 5. Some numerical results are given in Section 6. Finally, we give some conclusions in Section 7.

2. First-Order Affine Scaling LP Method Based on Probabilistic Models

In this section, we first introduce the construction method of a stochastic fully linear model and then give the affine scaling linear programming method based on probabilistic models. In the classical trust-region method, there need (n + 1) (n + 2)/2 interpolation points to build a quadratic interpolation model of function f with in the ball where is the center point and is the radius of the ball. It is obvious that the number of interpolation points is much larger than the dimension of problem (P) when the dimension increases. This brings great difficult to computer in storage and computation. In this section, we will introduce an affine linear programming model based on probabilistic models which need only n + 1 interpolation points. It means that we will build random fashion polynomial interpolation models. Firstly, we will give the models and some assumptions of the models.

2.1. Probabilistically Fully Linear Models

We propose the polynomial as proposed in [15], which is fully linear, and we consider the following linear models, written in the form:where . The following definition shows the accuracy of the model .

Definition 1 (see [14]). We say that a function is a fully linear model of on if, for every ,

Definition 2. (see [14]). We say that a sequence of random models is probabilistically fully linear for a corresponding sequence if the events
 = {} is fully linear for a corresponding sequence }satisfying the following submartingale-like condition:where is σ-algebra generated by . Furthermore, if , then we say that the random models are probabilistically fully linear.
In Definition 2, and the interpolation radii are the random variables defined over the s-algebra generated by . Mk depends on Xk and , and hence, Mk depends on .

2.2. The Affine Scaling Linear Programming Method Based on Probabilistic Models

Using the scaling matrix in [15], the form is as follows:

According to the same convergence theory, for any , the augmented affine scaling linear programming subproblem at the k-th iteration is defined aswhere .

Same as reference [15], we have that a “good” decrease of the quadratic objective function in along the gradient can lead to dual feasibility:

3. The Algorithm

In Section 2, we introduced the affine scaling LP method based on probabilistic models. In this section, we define the specific steps of the derivative-free affine scaling linear programming algorithm based on probabilistic models for solving problem (P).

3.1. Algorithm
3.1.1. Initialization Step

Fix the positive parameter with and . Select initial , and .

3.1.2. Main Step
(1)Approximate the function f in by the linear function .(2)Compute the affine scaling linear programming subproblem (Lk) and obtain the step d.(3)Choose until the following inequality is satisfied:(4)Letwhere , for some and .(5)If , then set and(6)If , then set and . Increase k by one and go to step 1.

Remark 1. Each realization of the algorithm defines a sequence of realizations for the corresponding random variables, in particular, , and .

Remark 2. Similarly, the definition of in [15], for the search direction , we havewith if for all . A key property of the scalar is that an arbitrary step to the point does not violate any linear inequality constraints. To see this, we first observe that if for some , and hence implies . Therefore, for all ,which means that the i-th linear strict inequality constraint holds.
If for some , hence implies . We have that from ,Hence, (12) and (13) mean that no matter what cases the inequality for all any holds.

4. Global Convergence

In this section, we will introduce the global convergence of our algorithm, which means that the algorithm proposed in this paper converges to the first-order critical points. For the purpose of proving the global convergence of the algorithm, we assume that the initial iterate point is given. Then, all the subsequent iterates belong to the level set

However, the failed iterates may lie outside this set. In the setting considered in this paper, all potential iterates are restricted to the regionwhere is the upper bound on the size of the interpolation radius, as imposed by the algorithm, and we define conv () to be the convex hull of .

Next, we also give the following assumption for the global convergence of algorithm.

Assumption 1. Suppose and are given. We assume that is continuously differentiable in an open set containing the set and that is Lipschitz continuous on conv () with constant , i.e., for , we have thatWe also assume that f is bounded from below on .

Assumption 2. Suppose the level set is bounded.
Based on solving the augmented linear programming subproblem , the following lemma shows that holds.

Lemma 1. For every realization of the algorithm, let be the solution of subproblem ; then,

Proof. LetSinceBecause , we have thatHence, we have that . Because , we have that is a feasible solution of subproblem .
Because is the solution of subproblem , we have thatThe value of isAccording to the definition of , we have thatwhere .
By (21)–(23), we have thatThe following lemma shows that every iterative point of the algorithm is a feasible solution of problem (P), i.e., .

Lemma 2. Let be a sequence generated by the algorithm; then, we have that there exists such that .

Proof. According to the definition of and , we have thatBecause and , we have thatIf , we have that holds, i.e., . Otherwise, if , then from (26), for , we have thatBy (11) and (27), we have that there exists such that holds in the k-th iteration, i.e., there exists such that .
The following lemma states that the interpolation radius converges to zero regardless of the realization of the model sequence made by the algorithm.

Lemma 3. Suppose Assumptions 1 and 2 hold, for every realization of the algorithm, we have that

Proof. First, we can see that if holds as , then (28) holds according to the definition of and Step 6 of the algorithm. If and hold as , equation (28) holds according to the definition of and Step 5 of the algorithm. Otherwise, according to Step 3 of the algorithm, we have thatBy (17), we have thatAssumption 2 shows that is a bounded function; hence, as . By (30), we know that as . If , then there exists a positive index such that for all , we have that (28) holds. Otherwise, we have that for all , if , then the conclusion holds. Otherwise, we assume that for all , then (30) means that holds; according to the modified method of in Step 5, we have that will decrease and tend to zero, which is a contradiction. Hence, we have that (28) holds.
The following lemma shows that in the presence of sufficient model accuracy, a successful step will be achieved, provided that the interpolation radius is sufficiently small relative to the size of the model gradient.

Lemma 4. Suppose Assumptions 1 and 2 hold, if m is fully linear on andwhere is a positive constant such that , then at the k-th iteration .

Proof. Since has full row rank in the compact set , is bounded, there exists a constant such that . By the definition of and Lemma 2, we have thatLet , we have that .
By accepting the rule of the step , we have thatUsing the mean value theorem, we have the following equality:with . Since is Lipschitz continuously differentiable with constant , we have thatBecause is a fully linear model of f on and , is a fully linear model of on . According to Definition 1, we have thatBy (33), (34), (35), and (36), we have thatDividing in the two sides of (37) and noting , we have thatBy (33), we have that .
Next, we assume that the model used in the algorithm is probabilistically fully linear, and we will state an auxiliary result from the martingale literature that will be useful in our analysis.

Theorem 1 (see [14]). Let be a submartingale, i.e., a sequence of random variables which, for every , are integrable and , where is the σ-algebra generated by , and denotes the conditional expectation of given the past history of events . We assume further that for every . We consider the random events C = { exists and is finite} and . Then, .

Proof. The theorem is a simple extension of Theorem 5.3.1 in [16].
Now, we will give the global convergence of the algorithm.

Theorem 2. Suppose that the model sequence is probabilistically fully linear for some positive constants and . Let be a sequence of random iterates generated by the algorithm. Then, almost surely,

Proof. According to the definition of in Definition 2, we use the the following random walk in [14]:where if occurs, otherwise. It is easily seen that is a submartingale. The proof can be found in Theorem 2 [14]. By Theorem 4.6, we can also see that the event holds almost surely. Suppose there exist positive constant ε and index such thatFor all .
Let and be any realization of and , respectively. By Lemma 3, there exists such that for all ,From the definition of fully linear models, we have thatAccording to Assumption 2, we have that is a reversible matrix, and by and , we have thatwhere is an upper bound of .
From (43) and (44), we know that . Hence, we have thatSince has full row rank in the compact set , is bounded, there exists a constant > 0 such that . Similarly, to prove (32), we have that there exists a positive constant such that . Combining (28), (43), (44), and (45), we can obtain thatas . By (41) and (46), we obtain thatHence, we have thatUsing Lemma 4 and (28), we obtain that andBy the construction of the algorithm, and the fact that , we have that . Similar to the proof of Theorem 2 in [14], we have thatalmost surely.
Next, we show thatalmost surely. Before stating and proving the main theorem, we will give the following lemmas.

Lemma 5 (see [14]). Let be a sequence of nonnegative uniformly bounded random variables and be a sequence of Bernoulli random variables such that

Let P be the set of natural numbers such that and N = N\P, where P and N are the random sequences. Then,

Proof. The proof is similar to Lemma 1 in [14].

Lemma 6 (see [14]). Let and be sequence of random iterates and random trust region radii generated by the algorithm. We fix and define the sequence consisting of the natural numbers k for which , where is a sequence of random variables. Then,almost surely.

Proof. The proof is similar to Lemma 2 in [14].
Next, we will prove that holds.

Theorem 3. Suppose Assumptions 1 and 2 hold, we suppose that the model sequence is probabilistically fully linear for some positive constants and . Let be a sequence of random iterates generated by the algorithm; then,

Proof. We suppose that does not hold almost surely. Then, there exists such that holds for infinitely many . Let be a subsequence of the iterations for which . According to Theorem 3, we have that there exists a pair of integers such that and , and , hence, for any , we have . Now, we consider the sequence of these intervals . For which for , where and are the realizations of and , respectively. Then, for any , we have thatSince is Lipschitz continuous, we have thatAccording to the definition of , we have thatCombining (56), (57), and (58), we have thatwhere .
From (28), then, for any large enough, . Similar to the proof of (38), we have that for all index ; hence, , which give us , which contradicts Lemma 6; hence, the conclusion is true.

5. Local Convergence

In this section, we will show the local convergence of the algorithm, i.e., we prove that our algorithm superlinearly converges to the first-order stable point of problem (P). Let be a sequence of random iterates generated by the algorithm and let be realizations of . For the local convergence of the algorithm, we will give the following assumption.

Next, we will discuss the convergence rate for the proposed algorithm. For this purpose, it first shows that for large enough k, the step size , i.e., for large enough k.

Lemma 7. Suppose Assumptions 1 and 2 hold, if the nondegenerate property of problem (P) holds at every limit point of . Moreover, if is the fully linear model of , for sufficiently large k, then the step .

Proof. First, we suppose that the conclusion is not true, i.e., for every k, the following inequality holdsBecause and , then as . Hence, we divide on both sides of inequality in (60), and we have thatBy the preserving character of the limit and (61), we have thatwhich is contrary with in Lemma 1. Hence, the assumption is not true, and for sufficiently large k, then the step .
The following theorem shows that the algorithm is local superlinear convergence to optimal solution .

Theorem 4. Suppose Assumptions 1 and 2 hold, and let be the limit point of the sequence which is generated by the algorithm, if and with , theni.e., the algorithm is superlinear convergence.

Proof. According to the definition of and the Step 3 of the algorithm, we have that is a strict interior point of problem (P). Hence, there exists a positive constant such thatBy (64), we have thatBy (65), we obtain thatBy (66), and as , we have thati.e., the algorithm is superlinear convergence.

6. Numerical Examples

In this section, we present the numerical experiments of our algorithm. In order to show the effectiveness of our algorithm, we choose the same test problem and compare the calculation results with algorithms COBYLA [17] and SDPEN [18]. First, the parameters used in the algorithm are as follows:

We test the algorithm for “HS” problems which come from [19]; the test problems are reported in Table 1. The program code is written in MATLAB and run in MATLAB 7.0 environment. In Table 1, n denotes the dimension of the problem, and Ineq. denotes the number of inequality constraints. In this section, we choose 46 test problems and compare the number of function evaluations used in our method with COBYLA [17] and SDPEN [18]. We report our results using performance profiles. Such profiles compare the number of function evaluations needed by each solver to achieve the desired accuracy in the objective function value. We use three different levels of accuracy: 2, 3, and 5 significant figures in .

The interpolation sample set is important for the algorithm. In this paper, we generate the sample set by randomly selecting the sample points, i.e., we suppose that is a current iterate point; we choose n random points in a ball of radius around . The derivative-free algorithm uses the trust region method which constructs quadratic interpolation function using (n + 1) (n + 2)/2 sample point. According to Section 2.1, we know that the dimension of the interpolation basis matrix is (n + 1) (n + 2)/2. For taking the quadratic interpolation function, we need to compute the inverse matrix of ; hence, the computation cost is very expensive, especially when the dimension of the problem is very large. Now, we use linear interpolation function, and the dimension of the interpolation basis matrix is only (n + 1); the computation cost is also far less than the construction of the quadratic interpolation function. The comparison results of CPU time are recorded, as shown in Figure 1; from Figure 1, we can see that the CPU time of building quadratic interpolation function is far more than linear interpolation function.

Using the performance profile of Dolan and Moré [20], we can demonstrate the overall behavior of the present algorithm and get more insights about their performance. Let S be the set of all algorithms and P be the set of test problems, with and . For each problem p and solver s, let denote the quantities we want to compare for problem p and solver s. The performance ratio is defined as

In order to compare the performance of solver s on problem p with the best performance by any solver on . For any factor , the overall performance of algorithm s is given by

The values on the left side of the figures represent the efficiency of each solver, and the values on the right side give information about the robustness of the solvers. This means that the best solver is the highest on the figures.

Figure 2 shows the comparison results of using our algorithm, COBYLA, and SDPEN to solve the test problems in Table 1 under termination accuracy level 2 (), respectively. It is obvious from the figure that the speed of the red curve (representing our algorithm) tending to 1 is faster than the other two curves, and when , the value of the red curve is obviously higher than the other two curves. This shows that when the termination level is 2, the number of iterations spent by our algorithm is less than the other two algorithms when we solve the test problems in Table 1, and our algorithm has high computational efficiency under the condition of the low termination level.

Figure 3 records the results of the number of iterations spent by the three algorithms in solving the test problem in Table 1 when the termination level is 3 (). It can be clearly seen from Figure 2 that when the termination level is increased to 3, the performance rate of the three algorithms is lower than that of accuracy level 2. However, when , the value of the red curve is still significantly higher than the other two curves, and the speed of red curve tending to 1 is significantly faster than the other two curves. This shows that after the accuracy level is improved, the number of iterations spent by our algorithm in solving the test problem is still less than that of the other two algorithms, and its computational efficiency is still high.

Figure 4 records the calculation results of three algorithms when the termination level is 5 (). It can be seen from Figure 3 that although the computational efficiency of the three algorithms has decreased significantly at a high accuracy level, the red curve still shows a high level. Whether from the value when or the speed tending to 1, the red curve shows good characteristics, that is, under the condition of high accuracy level, the number of iterations spent by our algorithm in solving the test problem in Table 1 is still less than that of the other two algorithms.

Figure 1 reports the results of the CPU time of constructing interpolation functions in 100 time iteration using (n + 1) and (n + 1) (n + 2)/2 interpolation points, respectively. From the result, we can see that the interpolation function can be established faster by using our algorithm. Lemma 1 shows that the linear interpolation model has the same drop as the two interpolation trust region model. Hence, our algorithm is more effective than the algorithm using two interpolation function.

In order to test the effectiveness of our algorithm in solving problems with large dimensions, we selected 6 problems in Table 2, calculated them with our algorithm, COBYLA, and SDPEN under different dimensions, and recorded the calculation results of level 5.

When the accuracy level is 5 (the termination accuracy is ), we use our algorithm to calculate the test problems in Table 2. At the same time, we use COBYLA and SDPEN to calculate the test problems in Table 2, compare the number of iterations spent by the three algorithms in calculating the test problems, and record the comparison results in Figure 5.

From Figure 5, we can see that the red curve representing our algorithm tending to 1 is the fastest, that is, the red curve tends to 1 faster than the curves representing the other two algorithms. Moreover, we can also see that when , the computational efficiency of our algorithm has reached 0.38, while COBYLA rectification has reached 0.25 and SDPEN has only reached 0.11, which shows that our algorithm not only spends less the number of iterations than COBYLA and SDPEN in solving low-dimensional problems but also shows high computational efficiency. Moreover, the number of iterations is still lower than that of COBYLA and SDPEN. In short, our algorithm has high computational efficiency when calculating the actual test problems.

7. Conclusion

This paper has described the derivative-free linear programming methods based on probabilistic models for linear inequality constrained optimization. The algorithm is based on methods built by linear polynomial interpolations, which use only n + 1 interpolation point and far less than (n + 1) (n + 2)/2 interpolation point in quadratic polynomial interpolations and reduce the computation cost of building interpolation function. The affine scaling linear methods which use probabilistic or random models are built to obtain the descent direction. In this paper, we use the backtracking line search technique to ensure that the step length along the direction of descent is in the feasible region, and the objective function is reduced. Under some reasonable conditions, the global and local fast convergence of the algorithm is shown and the results of numerical experiments are reported to show the effectiveness of the proposed algorithm [21–26].

Data Availability

The test problem data used to support the findings of this study have been deposited in the (test examples for nonlinear programming codes) repository (DOI: 10.1007/BF00934594).

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The authors gratefully acknowledge the partial supports of the National Science Foundation of China Grant (11371253) of China. The authors gratefully acknowledge the partial supports of the Natural Science Foundation of Hainan province Grant (120MS029).