A Distributed Conjugate Gradient Online Learning Method over Networks

Xu, Cuixia; Zhu, Junlong; Shang, Youlin; Wu, Qingtao

doi:https://doi.org/10.1155/2020/1390963

Complexity

On this page

Abstract Introduction Preliminaries Conclusion Appendix Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2020 | Article ID 1390963 | https://doi.org/10.1155/2020/1390963

A Distributed Conjugate Gradient Online Learning Method over Networks

Cuixia Xu,^1,2Junlong Zhu ,¹Youlin Shang,²and Qingtao Wu¹

Academic Editor: Diego R. Amancio

Received04 Nov 2019

Revised01 Jan 2020

Accepted10 Feb 2020

Published11 Mar 2020

Abstract

In a distributed online optimization problem with a convex constrained set over an undirected multiagent network, the local objective functions are convex and vary over time. Most of the existing methods used to solve this problem are based on the fastest gradient descent method. However, the convergence speed of these methods is decreased with an increase in the number of iterations. To accelerate the convergence speed of the algorithm, we present a distributed online conjugate gradient algorithm, different from a gradient method, in which the search directions are a set of vectors that are conjugated to each other and the step sizes are obtained through an accurate line search. We analyzed the convergence of the algorithm theoretically and obtained a regret bound of , where T is the number of iterations. Finally, numerical experiments conducted on a sensor network demonstrate the performance of the proposed algorithm.

1. Introduction

Distributed optimization has received considerable interest in science and engineering, which can be applied in numerous fields such as distributed tracking and localization [1], multiagent coordination [2], distributed estimations using sensor networks [3–5], and machine learning [6]. Such problems can be modeled to minimize or maximize the summation of some of the local convex functions, and these local functions only use a local computation and communication in a distributed manner. With an increase in the network size and data volume, more effective distributed algorithms have become a hot research topic. In recent years, many scholars have proposed various distributed optimization algorithms to solve such problems [7–15].

Most of the existing algorithms assume that the cost function at each agent is fixed. However, in practical problems, the environment of an agent is uncertain and the cost function of each agent changes over time, requiring us to solve such problems through an online setting. To be more precise, in a distributed online optimization, the cost function of each agent changes during each step, and with t iterations, before a decision is given, the cost function for each agent is unknown; only when we obtain a decision from a constrained set, we can obtain the information of the cost function. In addition, we also obtain a loss at the same time. Such loss reflects the error in the cost of the objective function between the current decision point and the best decision in hindsight, which we call regret. Regret is an important criterion in evaluating a distributed online algorithm. A well-performing distributed online optimization algorithm should decrease the average total regret approach to zero over time.

Because an online distributed optimization algorithm is more consistent with that used for practical problems, many scholars have conducted numerous studies and some effective algorithms have been proposed [16–24]. Yan et al. [20] introduced a distributed autonomous online learning algorithm, namely, a subgradient descent method using a projection. When the local objective functions are strongly convex and convex, the regret bounds of the proposed algorithm are obtained, respectively. The authors in [22] introduced an online distributed push-sum algorithm in which the search direction is a negative subgradient in each iteration, which achieves regret O((log(T))²) when the local function is strongly convex. For a time-varying directed network problem, Zhu et al. [25, 26] proposed a distributed online optimization algorithm. During each iteration, the negative subgradient is randomly selected as the search direction. The authors in [27] presented a distributed online algorithm based on a primal-dual dynamic mirror descent for a problem with time-varying coupling inequality constraints and obtained a dynamic regret bound. The authors in [28] proposed a distributed online conditional gradient algorithm for a constrained distributed online optimization problem in the Internet of Things.

The existing distributed online optimization algorithm based on the gradient method is simple to calculate and requires little storage; however, to ensure the convergence of the algorithm, the iterative step length usually needs to decrease with an increase in the number of iterations, which will lead to a zigzag path at the end of the algorithm. That is, the algorithm will carry out multiple iterations in the same direction or approximate direction, which greatly increases the computational time of the algorithm. The conjugate gradient algorithm also has the advantages of simple calculations and guaranteed convergence under certain conditions [29–31] but differs from the gradient method in that the search direction of the conjugate gradient algorithm is a group of conjugate or approximately conjugate vectors, and during the later stage of the algorithm, there are no additional repeated iterations in the same or approximate direction. Thus, the convergence of the conjugate gradient method is generally faster than that of the gradient descent method. In particular, for an objective functional quadratic, the conjugate gradient method has a quadratic termination. Based on these advantages, the conjugate gradient method has been used to solve numerous centralized offline optimization problems [32–35]. According to the existing literature, however, the conjugate gradient method has not been applied to distributed online optimization problems. To fill in this gap, herein, we present a distributed online conjugate gradient algorithm.

There are two main contributions provided by the present study. First, a new algorithm for a distributed online constrained convex optimization problem, namely, a distributed online conjugate gradient algorithm, is proposed. In our algorithm, a set of conjugate directions is used to replace the gradient directions used in a traditional gradient descent method, and the step size is obtained through an accurate linear search, thus effectively avoiding the slow convergence speed of a traditional gradient descent algorithm during the later stage. Second, we provide a careful analysis of the convergence of the proposed algorithm and obtain the square root of the regret bound.

The remainder of this paper is organized as follows: in Section 2, we first briefly introduce the distributed online optimization model, followed by some necessary mathematical preliminaries and assumptions used in this study. We also provide a detailed statement of our algorithm in Section 3 and an analysis of the convergence of the algorithm in Section 4. The simulation results of our algorithm are then presented in Section 5. Finally, we provide some concluding remarks in Section 6. In addition, further detailed proofs of some of the lemmas applied can be found in the Appendix.

2. Preliminaries

In this section, we provide a brief background on the distributed online optimization and the conjugate gradient method. At the same time, some constructs used in this study and some relevant assumptions regarding our analysis are provided.

2.1. Distributed Online Optimization

Consider a network system with multiple agents; in this network, each agent i is associated with a convex function . All agents aim to solve the following general consensus problem cooperatively:

During each round t ∈ {1, …, T}, the ith agent is required to generate a decision point x_i(t) from a convex compact set . Then, the adversary replies to each agent’s decision with a cost function , and each agent has a loss of f_ti(x_i(t)) simultaneously. The communication between agents is specified by a graph , where is the vertex set and is the edge set. Each agent i can only communicate with its immediate neighbors . The goal of the agents is to seek a sequence of decision points such that the cumulative regret with respect to each agent i regarding any fixed decision in hindsightis sublinear in T, that is, lim_T⟶∞R_T/T = 0.

2.2. Conjugate Gradient Method

For the following optimization problem,where f(x) is a quadratic continuous differentiable. The iterative form of the conjugate gradient (CG) method is usually designed aswhere x(k) is the point from the kth iteration, α_k > 0 is the step length, and the search direction d_k is defined asin which, is the gradient of the objective function at the current iterate point x(k), is a scalar, and the different definitions of β_k represent different methods of a conjugate gradient [27]. Well-known conjugate gradient methods include the Polak–Ribiere–Polyak (PRP) method and the Fletcher–Reeves (FR) method. In this study, we define the parameter β_k using the PRP method, the specific form of which is as follows:

Gilbert and Nocedal [36] proved that if the parameter β_k is appropriately bounded in magnitude, the CG method can converge globally. Therefore, the CG method satisfies the sufficient descent condition under this hypothesis.

To analyze the convergence of our algorithm, we provide the bound of the conjugate gradient as follows.

Lemma 1 (see [37]). Let f(x) be a quadratic continuous difference convex function, and ∇²f(x) be a Hessian matrix of the function. For any , when and , there exist two positive numbers m and M such that

Taking an initial point x₍₁₎ ∈ C, where x_k, d_k, and β_k are all defined using the PRP method,in which .

2.3. Some Constructs and Assumptions

The following assumptions are given throughout this paper:(i)Each cost function f_ti(x) is a convex and twice continuous differentiable L-Lipschitz on the convex set .(ii)The set is compact and convex, and , 0 denotes a vector with all entries equal to zero.(iii)The Euclidean diameter of is bounded by R.

As the Lipschitz condition in (i) implies, for any and any gradient , we have the following:Where denotes the dual norm.

The next definition is used throughout this paper.

Definition 1 (see [38]). Let f(x) be a function difference on an open set , and let be a convex subset of C. Then, f(x) is convex on if and only iffor all .
Now, we give an important inequality in [39] that is often used in optimization problems.
Let f(x) be a first-order continuous differentiable function on the set , whose first derivative satisfies the Lipschitz condition, and thus ,where L is the Lipschitz constant and ∥⋅∥ denotes the European norm.

3. Distributed Online Conjugate Gradient Algorithm

For the distributed online optimization problem (1), each locally cost function f_ti(x) satisfies the assumptions in Section 2. The network topology relationship among agents is specified by an undirected graph , that is, if , it implies that . Each agent i can only communicate with its immediate neighbors. The adjacency matrix of the undirected graph is a doubly stochastic symmetric , such that p_ij ≥ 0 only if ; otherwise, p_ij = 0 and for all and for all .

To solve (1), we present a distributed online conjugate gradient algorithm. After giving a decision based on the current information, we can obtain the cost function f_ti(x) and compute the gradient . We can then calculate the value of β_i(t) using the gradients in the current iteration point x_i(t) and the previous iteration point x_i(t − 1). If β_i(t) > 0, a new search direction d_i(t), computed using a Gram–Schmidt conjugate of the gradients in the current iteration point x_i(t) and the previous search direction d_i(t − 1), can be constructed. If the parameter is β_i(t) ≤ 0, we then obtain the new search direction , which is equivalent to restarting the distributed online conjugate gradient algorithm in the direction of the steepest descent. The iteration step length α_i(t) can be obtained through an exact line search, and the next iteration point x_i(t + 1) can be obtained using the conjugate direction vector d_i(t) and step α_i(t). The specific algorithm is summarized in Algorithm 1.

(1)	Input: convex set , maximum round number T
(2)	Initialize: , ∀i ∈ V
(3)	for i = 1, …, T do
(4)	The adversary reveals f_ti, ∀i ∈ V
(5)	Compute the gradients ∈ ∂f_ti(x_i(t)), ∀i ∈ V
(6)	Compute:
	,
	β_i(t) = max{0, β_i(t)^PRP},

(7)
	, ∀i ∈ V
(8)	for each i ∈ V do
(9)
(10)
(11)	end for
(12)	end for

Here, we define the projection function used in this algorithm as follows:

4. Regret Bound Analysis

To analyze the regret bound for D-OCG, we provide some preliminary remarks and a few definitions. Using Algorithm 1, we can determine the following:

Now, we defineand from the evolution of z_i(t + 1), we can obtain

Now, the main results in our paper can be stated.

Theorem 1. The sequences of x_i(t) and z_i(t) generated by Algorithm 1 are given for all , where , , and , and we thus have the cumulative regret owing to the action of agent i^,swhere λ = max_{1≤i≤n, 1≤t≤T}{λ_i(t)}, b and D are two nonnegative constants, M and m are as defined in Lemma 1, n is the number of agents, and σ₂(P) is the second largest eigenvalue of the adjacency matrix P.

From Theorem 1, we obtain a regret bound of the proposed algorithm under the local convexity, which is sublinear to T, i.e., the regret bound of the D-OCG algorithm can approach zero as the value of T increases, where T is the number of iterations. It is evident that the value of the regret bound is related to the upper bound L of the gradient of the local objective functions and the diameter R of the constraint set . By Lemma 1, we know that the regret bound is also related to the Hessian matrix of the local objective functions. Moreover, the value of the regret bound is also related to the scale and topology of the network.

To prove Theorem 1, we now present the following lemmas.

Lemma 2. For any and , we can obtain the following inequality:

Proof. Based on assumption (i), the function f_t(x_i(t)) is L-Lipschitz continuous on the convex set , that is,and thusBy contrast,Combining equations (19) and (20), the proof of Lemma 2 is completed.
Now, we prove that the last term of inequality (20) has a particular bound.

Lemma 3. For any and ,

Proof. Because is a gradient of f_t(x) at x_i(t), using the convexity for function f_t(x), we haveThen, based on assumption (i), we know that . We can then obtain the following:Summing for t = 1, …, T for the average of , the following is obtained:The proof of Lemma 3 is completed.
Now, we turn our attention to the following term:According to the definition of the conjugate gradient, we give the bound of equation (25) in Lemma 4.

Lemma 4. For any and β_i(t) ≤ b (where b is a nonnegative constant), the following bound holds:

Proof. Based on the definitions of d_i(t) and , the left-hand side of the above inequality can be split into two:Thus, we prove that the first term in equation (27) has a bound. For any function f(x), we know thatwhere dom f is the domain of the function f(x). Therefore, for the functionwe can obtain for any ,that is,soBased on the definition of the conjugate function [40] and the updates for , we have the following:Because α(t) is a nonincreasing sequence, based on the definition of the conjugate function , we can obtain for all ,and thus we obtain the following:According to the inequality (11), we know thatA detailed proof of equation (36) is provided in Appendix A. The following inequality is then established:Summing both sides of the above inequation from t = 1 to T, we obtain the following:Through equations (33) and (38), we can write the following:We then analyze the bound on the second term in inequality (27). Because β_i(t) = max_1≤i≤n{0, β_i(t)^PRP}, and β_i(t) ≤ b, we then analyze the following two situations.

Case 1. Ifthen,The conclusion therefore clearly holds.

Case 2. Ifwithout loss of generality, we assume that for all ,soThat is,Summing for t = 1, 2, …, T,Next, we give the bound of .
Becausesimilar to the proof of inequality (33), we can obtain the following:Through inequality (11), we obtain the following:In addition to the definition of function , when z = 0,The definition of the projection shows that the supremum of the above equation is uniquely attained through . In addition, for all fixed z, when x = 0, , and thus we can obtain the supremum of to the set.Because the set is closed and φ(x) is strongly convex (for the definition of strongly convex, see [40]), the set described above is compact. By contrast, we know that 〈z, x〉 is differentiable in z, and the supremum is unique, and thus we can obtain the following: .
Then, we derive the next two equations through Taylor expansion:and thusBecause , ,and thereforeThus,Summing both sides of the above inequation from t = 1 to T, we obtainand combining equations (46)–(57), we obtain the following:Through equations (27), (39), and (58), we finalize the proof of Lemma 4.
Mark: (1)The definition of the conjugate function for is as follows:(2)Through inequality (15) and step 2 in Algorithm 1, for all , , we can obtain the following:Next, we provide an important inequality, which will be used in the proof of Lemma 6.

Lemma 5. (α-Lipschitz continuity of the projections). For any pair , we have the following:

A detailed proof of this Lemma can be seen in Appendix B.

Now, we focus on an analysis of a key result concerning regret, i.e., ∥x_i(t) − y(t)∥ in Lemma 6.

Lemma 6. For all and t ∈{0, …, T}, the following inequality is true:

Proof. Because x_i(t) and y(t) are the projections of z_i(t) and onto the set , through Lemma 5, we have the following:Now, considering the evolution of sequence {z_i(t)} in Algorithm 1, we obtain the following:Because p_ij is an element of a doubly stochastic matrix, , then we haveBased on Algorithm 1 and the definition of , we can determine that , and thusIn addition, we can obtainand based on the definition for the 1 norm of the vector (see [41]),To obtain a more specific bound of equation (46), we introduce a useful property of a stochastic matrix as follows [12]:where P^t−r−1 denotes the (t − r − 1)-th power of matrix P, e_i is the ith basis coordinate of an n-dimensional space , 1 denotes a vector with all entries equal to 1, and σ₂(P) is the second largest eigenvalue of stochastic matrices P and σ₂(P) ≤ 1, through which we obtain the following inequality:Combining equations (63), (68), and (70) yields the following:Thus, we complete the proof of Lemma 6
Now, we can provide a brief proof of Theorem 1.

Proof of Theorem 1. Combining lemmata 2–6 yields the following regret bound:By equation (71) and based on , , , we can obtain the conclusion to Theorem 1.

5. Simulation Experiments

To verify the performance of the D-OCG, we consider a problem of a distributed sensor network [18], which has n sensors and aims at the estimation of a random vector . In this network, at each time t ∈ {1, 2, …, T}, each sensor i receives an observation vector , in which the vector is time-varying owing to the effect of the observed noise. Assume that each sensor i has a linear model ϕ_i(x) = A_ix, where A_i is the observation matrix of sensor i, and and ∥A_i∥₁ ≤ ϕ_max. The local cost function in sensor i is defined as , where = A_ix + η_ti, in which η_ti is white noise. The mathematical model of this problem is

In an offline case, the cost function in each sensor i is fixed, and because we can know all information of the cost function in advance, the centralized optimal estimate for this problem can be obtained by

In a practical problem, the characteristics of the white noise may be unknown, or some sensors might not work properly for a particular reason, and we therefore need to find an estimate for vector x using a distributed online algorithm. Here, we set d = 1 and , and sensor i observes = a_tix + b_ti, where a_ti ∼U(0, 1) and (in which x ∼ U(a, b) indicates a random vector x uniformly distributed on (0, 1)). Then, the cost function for sensor i at each time t is given by , where and .

We verified the performance of the proposed algorithm based on the following three aspects:(1)First, we determined how the number of nodes in the network affects the performance of the D-OCG. We can see from Figure 1 that the average regret decreases slowly when we increase the number of nodes, and the algorithm is convergent on different scaled networks. When n = 1, the problem is equivalent to a centralized optimization problem, and our distributed optimization algorithm can reach the same effect as the centralized algorithm.(2)We then checked how the network topology influences the performance of the D-OCG. We implemented the algorithm on three types of graphs with nine nodes. In a complete graph, each node is connected to the remaining nodes, that is, all nodes can exchange information with each other. In a cycle graph, each node is only connected to two nodes directly adjacent to it. The connectivity of a Watts–Strogatz graph is between the complete graph and the cycle graph. From Figure 2, it can be seen that a better connectivity can lead to a slightly faster convergence.(3)We next compared our algorithm with the class algorithm D-OGD in [20]. The parameters used in these two algorithms are based on their theoretical proofs. The network topology relationship among nodes is complete, whereas for nodes n = 9, the step size is . As shown in Figure 3, the convergence speed of the two algorithms is initially close, but with an increase in the number of iterations, the D-OCG converges faster than the D-OGD, which fully reflects the excellent performance of the proposed algorithm.

6. Conclusion

We proposed a distributed online conjugate gradient algorithm to solve the distributed optimization problem with a convex constraint in a network. With this algorithm, the conjugate gradient is used to replace the gradient or subgradient in a traditional gradient decent method. Because the search direction is mutually conjugated throughout the entire algorithm iteration process, we can remove the disadvantage that a slow convergence has in the later stage of a gradient decent. We also presented a detailed analysis of the convergence for the proposed algorithm and obtained a regret bound for the optimization problem. The regret bound has a sublinear convergence. We applied the proposed algorithm (D-OCG) to a distributed sensor estimation problem. The numerical results show that our algorithm is feasible and effective, and under the same assumptions, the D-OCG has a better convergence rate than the traditional D-OGD gradient method.

Appendix

A. Proof of Lemma 5

Let . Based on the first-order optimality condition for convex optimization, for any , we obtain the following two inequalities:

Through equation (A.1), we obtain the following:and thus

In addition, because is a strong convex function, we obtain the following: ,namely,

We therefore haveand because we know α(t) ≥ α_i(t) ≥ 0, therefore

In addition, we havethat is,

Setting y = ω in equation (A.2) and y = x in equation (A.11) yields

Adding the above two inequalities, we obtain the following bound:

By contrast, φ(x) is a strong convexity function, which implies that

Adding the above two inequalities, we have the following:and thus

Combining equations (A.13) and (A.16), we can obtainnamely,

Thus,and we therefore obtain

This completes the proof of the claim in Lemma 5.

B. Proof of Inequality (36)

Based on the definition of the conjugate function, we can obtainwhere is the indicator function of set . In addition, when ; otherwise, . By contrast,and is compact, and thus the supremum of can be uniquely attained through . Here, 〈z, x〉 is differentiable in .

Because the projection is Lipschitz continuous, we have the following:

Through Lemma 1.2.3 and Corollary 4.4.5 in [39], we can obtain

However,and thus

Data Availability

No data were used to support this study.

Conflicts of Interest

The authors declare no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (NSFC), under Grant nos. 11471102, 61976243 and 61871430, the basic research projects in the University of Henan Province, under Grant nos. 19zx010 and 20zx001, and the Science and Technology Development Programs of Henan Province, under Grant no. 192102210284.

References

D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1989.
B. Charrow, N. Michael, and V. Kumar, “Cooperative multi-robot estimation and control for radio source localization,” The International Journal of Robotics Research, vol. 33, no. 4, pp. 569–580, 2014.
View at: Publisher Site | Google Scholar
V. Lesser, C. Ortiz, and M. Tambe, Distributed Sensor Networks: A Multiagent Perspective, vol. 9, Kluwer Academic Publishers, Dordrecht, Netherlands, 2003.
M. Rabbat and R. Nowak, “Distributed optimization in sensor networks,” in Proceedings of the 3rd International Symposium on Information Processing in Sensor Networks, pp. 20–27, Berkeley, CA, USA, April 2004.
View at: Google Scholar
D. Li, K. Wong, Y. Hu, and A. Sayeed, “Detection, classification and tracking of targets in distributed sensor networks,” IEEE Signal Processing Magazine, vol. 19, no. 2, pp. 17–29, 2002.
View at: Google Scholar
L. Bottou, F. E. Curtis, and J. Nocedal, “Optimization methods for large-scale machine learning,” SIAM Review, vol. 60, no. 2, 2018.
View at: Publisher Site | Google Scholar
A. Koppel, F. Y. Jakubiec, and A. Ribeiro, “A saddle point algorithm for networked online convex optimization,” IEEE Transactions on Signal Processing, vol. 63, no. 19, pp. 5149–5164, 2015.
View at: Publisher Site | Google Scholar
P. Bianchi, W. Hachem, and F. Iutzeler, “A coordinate descent primal-dual algorithm and application to distributed asynchronous optimization,” IEEE Transactions on Automatic Control, vol. 61, no. 10, pp. 2947–2957, 2016.
View at: Publisher Site | Google Scholar
S. Pu and A. Nedić, “A distributed stochastic gradient tracking method,” 2019, http://arxiv.org/abs/1805.11454.
View at: Google Scholar
S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends^® in Machine Learning, vol. 3, no. 1, pp. 1–122, 2010.
View at: Publisher Site | Google Scholar
D. P. Bertsekas, “Centralized and distributed Newton methods for network optimization and extensions,” Mathematics, vol. 3, no. 3, pp. 5917–5922, 2015.
View at: Google Scholar
J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual averaging for distributed optimization: convergence analysis and network scaling,” IEEE Transactions on Automatic Control, vol. 57, no. 3, pp. 592–606, 2012.
View at: Publisher Site | Google Scholar
D. Jakoveti, J. Xavier, and J. M. F. Moura, “Fast distributed gradient methods,” IEEE Transactions on Automatic Control, vol. 59, no. 5, pp. 1131–1146, 2014.
View at: Publisher Site | Google Scholar
A. Nedic and A. Ozdaglar, “Distributed subgradient methods for multi-agent optimization,” IEEE Transactions on Automatic Control, vol. 54, no. 1, pp. 48–61, 2009.
View at: Publisher Site | Google Scholar
J. Zhu, P. Xie, M. Zhang, R. Zheng, L. Xing, and Q. Wu, “Distributed stochastic subgradient projection algorithms based on weight-balancing over time-varying directed graphs,” Complexity, vol. 2019, Article ID 8030792, 16 pages, 2019.
View at: Publisher Site | Google Scholar
H. Gokcesu and S. S. Kozat, “Efficient online convex optimization with adaptively minimax optimal dynamic regret,” 2019, http://arxiv.org/abs/1907.00497.
View at: Google Scholar
D. Mateos-Nunez and J. Cortes, “Distributed online convex optimization over jointly connected digraphs,” IEEE Transactions on Network Science and Engineering, vol. 1, no. 1, pp. 23–37, 2014.
View at: Publisher Site | Google Scholar
S. Hosseini, A. Chapman, and M. Mesbahi, “Online distributed convex optimization on dynamic networks,” IEEE Transactions on Automatic Control, vol. 61, no. 11, pp. 3545–3550, 2016.
View at: Publisher Site | Google Scholar
S. Hosseini, A. Chapman, and M. Mesbahi, “Online distributed optimization via dual averaging,” in Proceedings of the 2nd IEEE Conference on Decision and Control, pp. 1484–1489, Florence, Italy, December 2013.
View at: Publisher Site | Google Scholar
F. Yan, S. Sundaram, S. V. N. Vishwanathan, and Y. Qi, “Distributed autonomous online learning: regrets and intrinsic privacy-preserving properties,” IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 11, pp. 2483–2493, 2013.
View at: Publisher Site | Google Scholar
M. Raginsky, N. Kiarashi, and R. Willett, “Decentralized online convex programming with local information,” in Proceedings of the 2011 American Control Conference, pp. 5363–5369, San Francisco, CA, USA, June 2011.
View at: Google Scholar
M. Akbari, B. Gharesifard, and T. Linder, “Distributed online convex optimization on time-varying directed graphs,” IEEE Transactions on Control of Network Systems, vol. 4, no. 3, pp. 417–428, 2017.
View at: Publisher Site | Google Scholar
D. Yuan, D. W. C. Ho, and G.-P. Jiang, “An adaptive primal-dual subgradient algorithm for online distributed constrained optimization,” IEEE Transactions on Cybernetics, vol. 48, no. 11, pp. 3045–3055, 2018.
View at: Publisher Site | Google Scholar
A. H. Sayed, “Adaptive networks,” Proceedings of the IEEE, vol. 102, no. 4, pp. 460–497, 2014.
View at: Publisher Site | Google Scholar
J. Zhu, C. Xu, J. Guan, and D. O. Wu, “Differentially private distributed online algorithms over time-varying directed networks,” IEEE Transactions on Signal and Information Processing Over Networks, vol. 4, no. 1, pp. 4–17, 2018.
View at: Publisher Site | Google Scholar
C. Xu, J. Zhu, and D. O. Wu, “Decentralized online learning methods based on weight-balancing over time-varying digraphs,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 1–13, 2019.
View at: Publisher Site | Google Scholar
X. Yi, X. Li, L. Xie, and H. J. Karl, “Distributed online convex optimization with time-varying coupled inequality constraints,” 2019, http://arxiv.org/abs/1903.04277.
View at: Google Scholar
M. Zhang, W. Quan, N. Cheng et al., “Distributed conditional gradient online learning for IoT optimization,” IEEE Internet of Things Journal, 2019.
View at: Publisher Site | Google Scholar
M. J. D. Powell, “Convergence properties of algorithms for nonlinear optimization,” SIAM Review, vol. 28, no. 4, pp. 487–500, 1986.
View at: Publisher Site | Google Scholar
P. Faramarzi and K. Amini, “A modified spectral conjugate gradient method with global convergence,” Journal of Optimization Theory and Applications, vol. 182, no. 2, pp. 667–690, 2019.
View at: Publisher Site | Google Scholar
G. Yuan, “Modified nonlinear conjugate gradient methods with sufficient descent property for large-scale optimization problems,” Optimization Letters, vol. 3, no. 1, pp. 11–21, 2009.
View at: Publisher Site | Google Scholar
C. Charalambous, “Conjugate gradient algorithm for efficient training of artificial neural networks,” IEE Proceedings G Circuits, Devices and Systems, vol. 139, no. 3, pp. 301–310, 1992.
View at: Publisher Site | Google Scholar
M. Al-Baali, Y. Narushima, and H. Yabe, “A family of three-term conjugate gradient methods with sufficient descent property for unconstrained optimization,” Computational Optimization and Applications, vol. 60, no. 1, pp. 89–110, 2015.
View at: Publisher Site | Google Scholar
W. Rodi and R. L. Mackie, “Nonlinear conjugate gradients algorithm for 2-D magnetotelluric inversion,” Geophysics, vol. 66, no. 1, pp. 174–187, 2001.
View at: Publisher Site | Google Scholar
G. Li, C. Tang, and Z. Wei, “New conjugacy condition and related new conjugate gradient methods for unconstrained optimization,” Journal of Computational and Applied Mathematics, vol. 202, no. 2, pp. 523–539, 2007.
View at: Publisher Site | Google Scholar
J. C. Gilbert and J. Nocedal, “Global convergence properties of conjugate gradient methods for optimization,” SIAM Journal on Optimization, vol. 2, no. 1, pp. 21–42, 1992.
View at: Publisher Site | Google Scholar
B. Chen, Optimization Theory and Algorithm, Tsinghua University Press, Beijing, China, 2015.
S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, Cambridge, UK, 2004.
J. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms I, Springer, Berlin, Germany, 1996.
Y. Nesterov, Introductory Lectures on Convex Optimization, Kluwer Academic Publishers, Dordrecht, Netherlands, 2004.
R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, New York, NY, USA, 1985.

Copyright

Copyright © 2020 Cuixia Xu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies