Abstract
The aggregate constraint homotopy method uses a single smoothing constraint instead of -constraints to reduce the dimension of its homotopy map, and hence it is expected to be more efficient than the combined homotopy interior point method when the number of constraints is very large. However, the gradient and Hessian of the aggregate constraint function are complicated combinations of gradients and Hessians of all constraint functions, and hence they are expensive to calculate when the number of constraint functions is very large. In order to improve the performance of the aggregate constraint homotopy method for solving nonlinear programming problems, with few variables and many nonlinear constraints, a flattened aggregate constraint homotopy method, that can save much computation of gradients and Hessians of constraint functions, is presented. Under some similar conditions for other homotopy methods, existence and convergence of a smooth homotopy path are proven. A numerical procedure is given to implement the proposed homotopy method, preliminary computational results show its performance, and it is also competitive with the state-of-the-art solver KNITRO for solving large-scale nonlinear optimization.
1. Introduction
In this paper, we consider the following nonlinear programming problem: where is the variable, , , and , , are three times continuously differentiable, and is very large, but is moderate. It has wide applications, and a typical situation is the discretized semi-infinite programming problem.
From the mid-1980s, much attention has been paid to interior point methods for mathematical programming, and many results on theory, algorithms, and applications on linear programming, convex programming, complementarity problems, semidefinite programming, and linear cone programming were obtained (see monographs [1ā6] and references therein). For nonlinear programming, the typical algorithms used were the Newton-type methods to the perturbed first-order necessary conditions combined with line search or trust region methods with a proper merit function (e.g., [7ā9]). The general conditions of global convergence for these methods required that the feasible set be bounded and that the Jacobian matrix be uniformly nonsingular. Another typical class of globally convergent methods for nonlinear programming was probability-one homotopy methods (e.g., [10ā12]), whose global convergence can be established under some weaker conditions than the ones for Newton-type methods. The excellent feature is that, unlike line search or trust region methods, they do not depend on the descent of a merit function and so are insensitive to the local minimum of the merit function, in which any search direction is not a descent direction of the merit function.
In [10, 11], Feng et al. proposed a homotopy method for nonlinear programming (1), which was called the combined homotopy interior point (abbreviated by CHIP) method; its global convergence was proven under the normal cone condition (see below for its definition) for the feasible set as well as some common conditions. On the basis of the CHIP method, some modified CHIP methods were presented in [13, 14]; the global convergence was established under the quasinormal cone and pseudocone condition for the feasible set, respectively. In [12], Watson described some probability-one homotopy methods for the unconstrained and inequality constrained optimization, whose global convergence was established under some weaker assumptions. Recently, Yu and Shang proposed a constraint shifting combined homotopy method in [15, 16], in which not only the objective function but also the constraint functions were regularly deformed. The global convergence was proven under the condition that the initial feasible set, which approaches the feasible set of (1) as the homotopy parameter changes from 1 to 0, not necessarily the feasible set of (1), satisfies the normal cone condition.
Let ; then (1) is equivalent to which has only one, but nonsmooth, constraint. In [17], the following aggregate function was introduced, which is a smooth approximation of with a smoothing parameter and induced from the max-entropy theory: It is also known as exponential penalty function (see [18]). By using it for all constraint functions of problem (1), an aggregate constraint homotopy (abbreviated by ACH) method was presented by Yu et al. in [19], whose global convergence was obtained under the condition that the feasible set satisfies the weak normal cone condition. Although the ACH method has only one smoothing constraint, the gradient and Hessian of the aggregate constraint function where are complicated combinations of gradients and Hessians of all constraint functions and, hence, are expensive to calculate when is very large.
Throughout this paper, we assume that the nonlinear programming problem (1) possesses a very large number of nonlinear constraints, but a small number of variables, and the objective and constraint functions are not sparse. For such a problem, the number of constraint functions can be so large that the computation of gradients and Hessians of all constraint functions is very expensive and cannot be stored in memory and, hence, the general numerical methods for solving nonlinear programming are not efficient. Although active set methods only need to calculate gradients and Hessians of a part of constraint functions, require lower storage, and have faster numerical solution, the working set is difficult to estimate without knowing the internal structure of the problem.
In this paper, we present a new homotopy method called the flattened aggregate constraint homotopy (abbreviated by FACH) method for nonlinear programming (1) by using a new smoothing technique, in which only a part of constraint functions is aggregated. Under the normal cone condition for the feasible set and some other general assumptions, we prove that the FACH method can determine a smooth homotopy path from a given interior point of the feasible set to a KKT point of (1), and preliminary numerical results demonstrate its efficiency.
The rest of this paper is organized as follows. We conclude this section with some notations, definitions, and a lemma. The flattened aggregate constraint function with some properties is given in Section 2. The homotopy map, existence and convergence of a smooth homotopy path with proof are given in Section 3. A numerical procedure for tracking the smooth homotopy path, and numerical test results with some remarks are given in Section 4. Finally, we conclude the paper with some remarks in Section 5.
When discussing scalars and scalar-valued functions, subscripts refer to iteration step so that superscripts can be used for exponentiation. In contrast, for vectors and vector-valued functions, subscripts are used to indicate components, whereas superscripts are used to indicate the iteration step. The identity matrix is represented by .
Unless otherwise specified, denotes the Euclidean norm. is the feasible set of (1), whereas is the interior of and is the boundary of . The symbols and denote the nonnegative and positive quadrants of , respectively. The active index set is denoted by at . For a function , is the matrix whose ()th element is ; is the set-valued inverse for the set .
Definition 1 (see [10]). The set satisfies the normal cone condition if and only if the normal cone of at any does not meet ; that is,
Definition 2. If and only if there exists such that satisfies then is called a KKT point of (1) and is the corresponding Lagrangian multiplier.
Definition 3. Let be an open set and let be a differentiable map. We say is a regular value of if and only if
Lemma 4 (parameterized Sard theorem [20]). Let and be two open sets and let be a differentiable map with . If is a regular value of , then, for almost all , 0 is a regular value of .
2. The Flattened Aggregate Constraint Function
In this section, we suppose that the following assumptions hold.
Assumption 5. is nonempty and is bounded.
Assumption 6. For any , are positive independent; that is,
Assumption 7. satisfies the normal cone condition (see Definition 1).
The first assumption is the Slaterās condition and the boundedness of the feasible set, which are two basic conditions. The second assumption provides the regularity of constraints, which is weaker than the linear independence constraint qualification. The last assumption, the normal cone condition of the feasible set, is a generalization of the convex condition for the feasible set. Indeed, if is a convex set, then it satisfies the normal cone condition. Some simple nonconvex sets, satisfying the normal cone condition, are shown in [10].
In this paper, we construct a flattened aggregate constraint function as follows: where , , are some adjusting parameters, is a bivariate function satisfying for with , and for . For simplicity of discussion, we write as . By its definition, can be expressed as where, for any , satisfies There exist many kinds of functions , such as polynomial functions and spline functions. Since polynomial functions have simple mathematical expressions and many excellent properties, throughout this paper, is chosen as
Let ; by the definition of , in (10) can be rewritten as The gradient of with respect to is where Then the gradient and Hessian of only relate to a part of constraint functions; that is, .
Proposition 8. For any , , with , , ,(1)ā forāā;(2);(3).
Proof. (1) Because , there exists such that . By the continuity of and , we know that and for ; hence
Hence, item 1 is satisfied for , , and .
(2) By , we have
Hence, . Together with item 1, we have for , , and .
(3) If , then by its definition, and hence
else
where . Then, we have for , , and .
Proposition 9. For any given ,(1)ā,(2)āā whenāā,(3)āā whenāāā withāā,where denotes the cardinality of .
Proof. (1) By , we have
Then the left inequality can be obtained. By , we have
Then the right inequality can be obtained by Proposition 2.3 in [19].
(2) For any with , let ; then by its definition, and hence
then the left inequality is true. The right inequality can be obtained by item 1.
(3) It is trivial by the definition of .
Proposition 10. Let and ; one has(1)ā forāā;(2)for any bounded and closed setāā, there exists aāā, such thatāā;(3)ā asāā.
Proof. (1) If , which is equivalent to , by Proposition 9(2),
which means ; then we have .
(2) By the continuity of and the fact that is a bounded closed set, there exists a point such that reaches its maximum in , and . For any satisfying , by Proposition 9(1), for any ,
which means .
(3) For any , that is, , by the right inequality of Proposition 9(1),
where the second inequality comes from for , which means . Then, together with item 1,
Proposition 11. There exists a such that, for any , ; then .
Proof. If not, for any , there exist corresponding and such that , which means that there must exist three sequences , , and with , such that , , , and as . Then, by Proposition 8, we know that , as , and where the first equality comes from that for . Therefore, we have by taking limits on (30); this is a contradiction to Assumption 6.
Proposition 12. For any bounded closed set , there exists a such that for any .
Proof. If not, there exist a bounded closed set and four sequences , with , , and such that , , , , as , and By Proposition 8, we have where the first equality comes from that for . Then, we have By Assumption 6 and using (34), Hence is a bounded sequence, and there must exist a subsequence converging to ; then we have by (33) which contradicts Assumption 7.
3. The Flattened Aggregate Constraint Homotopy Method
In [19], the following aggregate constraint homotopy was introduced: where with , is the Lagrangian multiplier of the aggregate constraint function , and is the starting point in . Under the weak normal cone condition, which is similar to the normal cone condition, it was proved that the ACH determines a smooth interior path from a given interior point to a KKT point. Then, the predictor-corrector procedure can be applied to trace the homotopy path from to , in which is a KKT point of (1).
3.1. The Flattened Aggregate Constraint Homotopy
Using the flattened aggregate constraint function in (10), we construct the following flattened aggregate constraint homotopy: where is the Lagrangian multiplier of the flattened aggregate constraint and is the starting point and can be randomly chosen from .
We give the main theorem on the existence and convergence of a smooth path from to , in which is a solution of the KKT system of (1), and hence the global convergence of our proposal, namely, the FACH method, can be proven.
Theorem 13. Suppose that Assumptions 5ā7 hold and satisfies Propositions 10ā12 for the bounded closed set ; then for almost all starting points , the zero-points set defines a smooth path , which starts at and approaches the hyperplane . Furthermore, let be any limit point of , and āā( is a limit of as ), then is a KKT point of (1) and is the corresponding Lagrangian multiplier.
Proof. Consider as a map of the variable , for any , which means ; hence , because
is nonsingular and the Jacobi matrix of is of full row rank. Using the parameterized Sard theorem, Lemma 4, for almost all , is a regular value of . From
we know that is the unique and simple solution of . Then, by the implicit function theorem, there exists a smooth curve starting from and being transversal to the hyperplane .
Since can be extended in until it converges to the boundary of , there must exist an extreme point . Let be any extreme point of other than ; then only the following five cases are possible:(1), ;(2), ;(3), ;(4);(5).
Case (2) means that has another solution except , or is a double solution of , which contradicts the fact that is the unique and simple solution of . Case (3) means , , and ; hence
which contradicts the last equation of (39). Case (4) means , , and ; hence
which contradicts the last equation of (39). Then, Cases (2), (3), and (4) are impossible.
Because has only one and also unique solution , Case (2) is impossible. By the continuity and the last equation of (39), we know that Cases (3) and (4) are impossible.
If Case (5) holds, there must exist a sequence on such that , , and . By the last equation of (39), we have
as , which means . The following two cases may be possible.(1)If , let be any accumulation point of , which is known to exist from (39) and ; then we have
If , then contradicts ; else contradicts Proposition 12.(2)If , by (39), we have
the right-hand is finite; however, by , we know that the left-hand is infinite; this is a contradiction.
As a conclusion, Case (1) is the only possible case. This implies that must approach the hyperplane . Because (defined in (17)), , and are bounded sequences, we know that has at least one accumulation point as . Let be any accumulation point; . By (39) and the fact that as , we have
By the fact that , we have . If , we know that
where the first inequality comes from Proposition 9(1), the second equality comes from the continuity, and the third equality comes from the definition of in (11); hence by the last equation of (39); else , by Proposition 9(2) and for by Proposition 8. Thus we have
Summing up, is a solution of the KKT system of (1), which means that is a KKT point of (1) and is the corresponding Lagrangian multiplier.
3.2. The Modified Flattened Aggregate Constraint Homotopy
From Proposition 8(3), we know that as for any and, hence, we can use the following modified flattened aggregate constraint homotopy (MFACH) instead of the FACH: where
Remarks. (i) āSince is dropped in (51), the expressions of the homotopy map (50) and its Jacobian and hence the corresponding code become simpler. Moreover, the computation of in (50) and its Jacobi matrix is a little cheaper than that for the FACH method.
(ii) To guarantee that in (39) be a map, must be ; hence, ifāā is chosen as a polynomial, the total degree should be seven; in contrast, for the homotopy map in (50), is only needed to be . Then should satisfy and can be chosen as
(iii) The existence and convergence of the homotopy path defined by the MFACH can be proven in a similar way with that of the FACH.
4. The FACH-S-N Procedure and Numerical Results
4.1. The FACH-S-N Procedure
In this section, we give a numerical procedure, FACH-S-N procedure, to trace the flattened aggregate constraint homotopy path by secant predictor and Newton corrector steps. It consists of three main steps: the predictor step, the corrector step, and the end game.
The predictor step is an approximate step along the homotopy path: it uses a predictor direction and a steplength to get a predictor point. The first predictor direction uses the tangent direction, and others use the secant direction. The steplength is determined by several parameters. It is set to no more than , which can ensure that the predictor point is close to the homotopy path and hence stays in the convergence domain of Newtonās method in the corrector step. If the angle of the predictor direction and the previous one is greater than , the steplength will be decreased to avoid using the opposite direction as the predictor direction. If the corrector criteria are satisfied with no more than four Newton iterations for three times in succession, the steplength will be increased or kept invariable. Otherwise, the steplength will be decreased.
Once a predictor point is calculated, one or more Newton iterations are used to bring the predictor point back to the homotopy path in the corrector step. The corrector points , , are calculated by , , where the step is the solution of an augmented system defined by the homotopy equation and the direction perpendicular to the predictor direction. The corrector step terminates when and satisfy the tolerance criterions.
At each predictor step and corrector step, the feasibility of the predictor point and the corrector point needs to be checked. If in the predictor step or in the corrector step, a damping step is used to get a new point . Then, if is feasible, the end game, a more efficient strategy than predictor-corrector steps when the homotopy parameter is close to , is invoked. Starting with , Newton's method is used to solve where is a small positive constant. For other situations, the steplength will be decreased to make new predictor-corrector steps (see Algorithm 1).
|
4.2. The Numerical Experiment
Although there exist so many test problems, such as the CUTEr test set [21], we cannot find a large collection of test problems with moderate variables and very many complicated nonlinear constraints. In this paper, six test problems are chosen to test the algorithm. Problem 4.1 is chosen from the CUTEr test set and it is used to illustrate a special situation; others are derived from the discretized semi-infinite programming problems. We also give two artificial test problems 4.2 and 4.3 and use three problems 4.4ā4.6 in [22]. The numbers of variables in problem 4.2 and the number of constrains in problems 4.2ā4.6 can be arbitrary. For each test problem, the gradients and Hessians of the objective and constraint functions are evaluated analytically.
The FACH-S-N procedure was implemented in MATLAB. To illustrate its efficiency, we also implemented the ACH method in [19] using a similar procedure. In addition, we downloaded the state-of-the-art solver KNITRO, which provides three algorithms for solving large-scale nonlinear programming problems, and we used the interior-point direct method with default parameters to compare with the FACH and MFACH methods. For any iterate point , if , it was treated as a feasible point. The test results were obtained by running MATLAB R2008a on a desktop with Windows XP Professional operation system, Intel(R) Core(TM) i5-750 2.66āGHz processor, and 8āGB of memory. The default parameters were chosen as follows.(i)Parameters for the flattened aggregate constraint function: , , , and .(ii)Parameters in the end game section: and .(iii)Step size parameters: , , and .(iv)Tracking tolerances: and .(v)Initial Lagrangian multipliers: for ACH, FACH, and MFACH methods.
For each problem with different parameters, we list the value of objective function and the maximal function of constraints at , the number of iterations , the number of evaluations of gradients of individual constraint functions in the FACH and MFACH methods (in contrast, is in the ACH method), and the running time in seconds . For problems that were not solved by the conservative setting, we also give the reason for failure. The notation āfail1ā indicates that the steplength in predictor step is smaller than ; it is generally due to poor conditioned Jacobian matrix. The notation āfail2ā means out of memory. The notation āfail3ā means no result in 5000 iterations or 3600 seconds.
Example 14 (see [21]). Consider
Remark 15. In this example, the starting point happens to be an unconstrained minimizer of the objective function and we found that the -components of iterative points generated by the FACH and MFACH methods (not other homotopy methods) remain invariant. This is not an occasional phenomenon. In fact, if is a solution of the KKT system, that is, is a stationary point of the objective function , when the parameters , , and in FACH and MFACH methods satisfy , the -components of points on the homotopy path will remain invariant, which can be derived from the fact that is the solution of (39) and (50) for . Moreover, since the -component of is . By a similar discussion, the -component of is , and hence . In this analogy, we know that for any and in algorithm FACH-S-N.
Example 16. Consider
Example 17. Consider
Example 18 (see [22]). Consider
Example 19 (see [22]). Consider
Example 20 (see [22]). Consider To explain the numerical efficiency, we make the following remarks by numerical results in Tables 1, 2, 3, 4, 5 and 6.(i)If for any , the FACH and MFACH methods do not need to calculate the gradient and Hessian of any constraint functions.(ii)For problems whose gradients and Hessians of constraint functions are expensive to evaluate, the performance of the FACH and MFACH methods is much better than the ACH method based on similar numerical tracing procedures.(iii)Compared to the interior-point direct algorithm of the state-of-the-art solver KNITRO, the test results show that the FACH and MFACH methods perform worse when is small but much better when is large for most problems. In addition, we can see that their time cost increases more slowly than KNITRO solver as increases.(iv)The function is important for the flattened aggregate constraint function. Theoretically, the parameters , , and in the FACH and MFACH methods can be chosen freely. However, they do matter in the practical efficiency of the FACH and MFACH methods and should be suitably chosen. If these parameters are too large, then too many gradients and Hessians of individual constraint functions need to be evaluated and, hence, cause low efficiency. On the other hand, if they are too small, the Hessian of the flattened aggregated constraint function (10) may become ill-conditioned. In our numerical tests, we fixed , , and . In addition, the function can be defined in many ways, and preliminary numerical experiments show that the algorithms with different functions have similar efficiencies.(v)In algorithm FACH-S-N, we gave only a simple implementation of the ACH, FACH, and MFACH methods. To improve implementation of the FACH and MFACH methods, a lot of work needs to be done on all processes of numerical path tracing, say, schemes of predictor and corrector, steplength updating, linear system solving, and end game. Other practical strategies in the literature for large-scale nonlinear programming problems (e.g., [9, 23ā26]) are also very important for improving the efficiency.
5. Conclusions
By introducing a flattened aggregate constraint function, a flattened aggregate constraint homotopy method is proposed for nonlinear programming problems with few variables and many nonlinear constraints, and its global convergence is proven. By greatly reducing the computation of gradients and Hessians of constraint functions in each iteration, the proposed method is very competitive for nonlinear programming problems with a large number of complicated constraint functions.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgment
The research was supported by the National Natural Science Foundation of China (11171051, 91230103).