Abstract
We present a hybridization of the accelerated gradient method with two vector directions. This hybridization is based on the usage of a chosen three-term hybrid model. Derived hybrid accelerated double direction model keeps preferable properties of both included methods. Convergence analysis demonstrates at least linear convergence of the proposed iterative scheme on the set of uniformly convex and strictly convex quadratic functions. The results of numerical experiments confirm better performance profile in favor of derived hybrid accelerated double direction model when compared to its forerunners.
1. Introduction
The main goal herein is to derive an efficient optimization method for minimization of an objective function . Therewith, we assume the function is uniformly convex and twice continuously differentiable. Furthermore, for the gradient and the Hessian of the function at the -th iterative point we use the next notation: The general form of the iterations for finding the extreme values of the objective function is given by the next expression:where is the current, is the next iterative point, is iterative step length value, and is an iterative vector direction which leads us to the solution of the problem. Certainly, and are the most important issues of an optimization model (2) and they generate the efficiency of a relevant method. For that reason, the way of defining these two crucial elements is of great importance for each minimization scheme.
In one of the first algorithms for solving unconstrained optimization problems, denoted as the steepest descent gradient method which is exposed by Cauchy, the iteration is defined asand here, the descent direction is simply presented as the negative gradient vector, while iterative step size value is calculated by the exact line search formula:
Furthermore, in the general Newton methodthe vector direction is calculated as the product of the inverse Hessian and the gradient of the objective function. Defining the vector direction in this way guarantees fast convergence properties, but still, practical computing of the function Hessian and its inverse can be difficult. And so, many modified Newton, Newton-conjugate, and quasi Newton schemes were developed in which the calculation of the Hessian and its inverse is, somehow, avoided.
In the quasi-Newton methods the Hessian of the goal function or its inverse is approximated by the adequately defined matrix. Using this type of methods we generally reduce the time of computations since we avoid the complicated calculations in deriving the Hessian of the objective function. Nevertheless, the methods of quasi Newton type preserve good properties of the Newton method. For these reasons, in this paper, we propose the method of quasi Newton type where the value of the iterative step size parameter is obtained by the inexact Backtracking line search procedure.
In the second section we give an overview of some accelerated gradient methods and hybrid iterations. We elaborate the deriving of the hybrid accelerated double direction method and restate the algorithm in the third section of this paper. In the fourth section we give a convergence analysis regarding the proposed iteration. Numerical experiments and comparison are presented in the last section of this paper.
2. Preliminaries: Accelerated Gradient Methods and Hybrid Iterations
The authors in [1] rightfully detected a class of accelerated gradient descent methods, defined by the general iterative schemeIn the previous expression, presents an iterative acceleration parameter which improves performance of the relevant method. A common way to determine this parameter is through the features of the second-order Taylor’s series taken on appropriate scheme (6). Acceleration parameters that were computed in such way are applied in the methods described in [1–5]. According to the iteration form (6), we can conclude that the accelerated gradient methods are of the quasi-Newton type in which the approximation of the Hessian, i.e., its inverse, is obtained by the scalar matrix , where is appropriate identity matrix and is the matching acceleration parameter. Here are several expressions for defining the acceleration parameters of some accelerated gradient schemes: (SM method [1]) (ADD method [4])(ADSS method [2])(TADSS method [5])(HSM method [3]).
An interesting concept of merging iterations through the hybrid expression was suggested in some research articles (see [6–8]). Some of representations are given by the next set of iterations:where is a mapping defined on nonempty convex subset of a normed space , , and present the sequences defined by proposed iterations, and .
In [9] it was proved that the hybrid methodproposed by Picard, Mann, and Ishikawa, upgrades the hybrid models mentioned above. The authors of [3] used the advantages of the hybrid model (15) and derived a hybrid version of the accelerated gradient SM method from [1], termed the HSM method and defined byNumerical tests from [3] confirmed that the hybrid model (16) upgrades its forerunner SM iterative rule.
3. HADD Algorithm
We are motivated by the confirmed advantages which were approved in [3] when the scheme (15) was applied on the SM method. As a result, the hybrid SM model (called HSM) was defined and tested in [3]. Herein, we apply the same hybridization strategy to the accelerated double direction method (ADD method, shortly), introduced in [4]. Derived scheme will be based on the hybrid scheme (15) and with that it keeps the accelerated features of the ADD iterations.
In order to complete the presentation, we start from the ADD iteration:where is appropriately defined step size, the first direction vector is given by , and the second one, , is determined based on the next procedure Second direction. That procedure was introduced in [4], which was derived as a practical appearance of the more general procedure considered in [10]. The procedure Second direction is restated in Algorithm 1.
|
Remark 1. For further investigation within this topic, the second direction in the ADD iteration can be defined differently. For example, in [11] the authors proposed directional k-step Newton methods for solving a single nonlinear equation in n-variables. Accordingly, they established the semi-local convergence analysis for these models, based on two different approaches. The first one is based on recurrent relations, while the other, more preferable, is established using recurrent functions. Using one (or both) approaches from [11] in determining the second direction in the ADD method as well as in its hybrid version can be an interesting topic in further research.
Applying the hybrid scheme (15) on the iterative rule (17), we get the hybrid iterative schemeAfter replacing the third expression from the set of equations (18) into the second one, the next iterative rule follows:
To simplify further calculation, we will use a constant value for the parameter in (19), just like the authors did in [1, 9]. So, in (19), instead of we simply take . Now we can restate a hybrid ADD method, or the HADD iterative scheme, as follows:
Yet, we need to determine the iterative value of the accelerated parameter As we mentioned previously, this parameter can be appropriately defined using Taylor’s expansion of the proposed iteration (20) in two successive iterative points:The parameter in the previous expansion fulfills the condition In the next relation we substitute the value from (21) by the scalar matrix , which leads toFrom (23), it is possible to derive the approximation factor of the HADD scheme:
With the aim of preserving the Second-Order Necessary Condition and Second-Order Sufficient Condition, we assume positivity of the acceleration parameter: In practical computation, it is possible that (24) generates negative value for We resolve this situation by taking in such cases. As a consequence, then the first vector direction becomes the negative gradient vector In this special case, the next iterative point of the iteration (20) becomes
In order to present the main HADD algorithm, we need two additional auxiliary procedures. The first one is previously displayed Algorithm 1, by which we calculate the second vector direction, . The second procedure is the Backtracking line search algorithm for calculating the iterative step size value.
Algorithm 3 describes the main algorithm, termed the HADD algorithm.
|
|
4. Convergence of the HADD Method
The convergence properties of the established HADD iterative method are considered on the set of uniformly convex and strictly convex quadratic functions. In the case of uniformly convex functions the statements are the same as exposed in [1, 4]. For that reason, we just restate the following lemma, in which decreasing of the objective function in two successive points is estimated with respect to the HADD scheme. Thereupon, the upcoming theorem confirms linear convergence of our hybrid accelerated model.
Lemma 2. Suppose the function is twice continuously differentiable and uniformly convex on . With that, let the sequence be generated by Algorithm 3. Then the next estimation is true where
Theorem 3. For the twice continuously differentiable and uniformly convex function on and the sequence generated by Algorithm 3, the following holds: Therewith, the sequence converges to the optimal solution at least linearly.
We show now that the iteration (20) is convergent regarding the set of strictly convex quadratic functionsIn (29), it is assumed that is a real symmetric positive definite matrix and that vector is given. The smallest and the largest eigenvalues of the matrix , respectively, are denoted by and .
Lemma 4. Let be the strictly convex quadratic function defined by (29), where is a symmetric positive definite matrix. Let and be the smallest and the largest eigenvalues of . Then, the following inequalities are valid for the hybrid accelerated gradient model (20):
Proof. Let us calculate the difference in two successive iterative points of the goal function (29):Including the iteration (20) we continue computations: Applying the equality and the symmetry property of , we get The right hand side of the previous expression can be further transformed as follows:The replacement of by the right hand side of (34) into (24) leads us toAfter some calculations, we obtainPrevious expression confirms that is the Rayleigh quotient of the real symmetric matrix at the vector , which leads us to the conclusion The left hand side in inequalities (30) arrives from the fact . To prove the right hand side of (30), we use the estimation [[3], eq. (3.8)]: Previous inequality impliesWe can approximate the Lipschitz constant by the largest eigenvalue and use the fact that and . Then (39) is restated toEstimation of the Lipschitz constant by the largest eigenvalue is certainly valid since the matrix is symmetric and From these two facts we conclude thatwhich completes the proof.
Theorem 5. Let the iterations (19) be applied on strictly convex quadratic function given by the expression (29). Suppose that the conditionholds for the largest and the smallest eigenvalues of symmetric positive definite matrix . Then, the following estimations are true:whereand Therewith
Proof. Let us consider the orthonormal system of eigenvectors of the matrix . Thereon, we construct the sequence of values by applying Algorithm 3 on strictly convex quadratic function defined by (29). Then, for some and for some constants it follows thatApplying (20), further we conclude thatHaving in mind the representation (46), one can verify To prove i.e., the inequalities (43), we only need to verify that since for all . There are two possibilities:
(1)
This case implies the next set of inequalities:As a consequence, we can conclude that (2)
In this case, one can verify the following estimations:The representation (46) and the fact that is an orthonormal system of eigenvectors lead to the next conclusionNow, knowing that the parameter under condition satisfies , we confirm that the final statement is true.
Remark 6. The assumption (42) used in the previous theorem is required in order to prove that the HADD process is convergent for the strictly convex quadratics. Therewith, knowing that the hybrid parameter implies points to the conclusion that Theorem 5 is applicable to very few cases. However, this is not entirely so since we choose only one particular value for the practical computations. Regarding this matter, the authors in [3] numerically confirmed that the optimal value of the hybrid parameter is the one close to the left limit of the interval , i.e., the value which is very close to 1. Therefore, we choose for numerical tests displayed in the next section. Choosing the similar values for hybrid parameter , the condition (42) becomes very close to the condition , used in [12], under which Q-linear convergence rate of the preconditioned BB method was established.
5. Computational Tests and Comparisons
The performance of the C implementation of derived HADD model is investigated on a set of 630 test unconstrained optimization problems picked from [13]. We conduct the testings on a Workstation Intel Celeron GHz. The following stopping criteria are used: The values of the Backtracking parameters are set up as follows: and .
We compare the hybrid accelerated HADD method with its forerunner ADD scheme, as well as with the hybrid accelerated HSM method. The number of function evaluations is the performance profile measured in all tests. The dominance of the ADD method regarding the number of iterations among the other comparative models was confirmed in [4]. However, from that research we do not have any information about the behavior of the ADD method when the number of function evaluations is involved. With respect to this parameter, the HSM scheme upgrades the accelerated SM method as well as Nesterov’s line search algorithm; see [3]. For these reasons, our experimental goal is to numerically prove better performance feature of the HADD method, considering the number of function evaluations, when compared with the ADD and the HSM method.
In Table 1, we display the number of problems, out of 630, for which an algorithm achieved the minimum number of function evaluations. In the same table, we also display the total number of problems for which all three algorithms achieved an equal number of function evaluations. Based on the results displayed in this table, it is obvious that the HADD scheme convincingly outperforms the other two comparative models.
For more clear visualization of the performance of the HADD algorithm versus the ADD and the HSM algorithms, we display in Figure 1 the Dolan-Moré’s performance profile subject to the number of function evaluations metric. As we can see, the HADD scheme is more robust and therewith more efficient than the other two methods.

Obtained numerical results confirm that applied hybridization process is a good way to improve some important characteristics of chosen accelerated methods. Preferable outcomes of the HADD scheme, regarding analyzed characteristic, come from the properly chosen hybrid value , together with derived accelerated parameter . Good convergent properties of defined HADD process can be a reason for applying proposed hybridization on some other gradient and accelerated gradient models.
6. Conclusion
We present a hybrid accelerated double direction gradient method for solving unconstrained optimization problems. The HADD method is derived by applying good properties of the hybrid representation introduced in [3] in conjunction with the form of double direction optimization model with accelerated parameter presented in [4]. The convergence of defined optimization model is provided on the set of uniformly convex and strictly convex quadratic functions.
The HADD scheme reserves preferable features of both forerunner methods. Therewith, according to conducted numerical experiments, it outperforms the ADD and HSM methods regarding the requested number of function evaluations. We evaluated the Dolan-Moré performance profiles of comparative methods and showed that the HADD iteration is the most efficient compared to the other two algorithms.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
The first and the third authors acknowledge support from the internal research project IS01-17 supported by the Faculty of Sciences and Mathematics, University of Priština, Serbia. The second author gratefully acknowledges support from the Project supported by Ministry of Education and Science of Republic of Serbia, Grant No. 174013.