Adaptive Critic Learning-Based Robust Control of Systems with Uncertain Dynamics

Zhao, Jun; Zeng, Qingliang; Guo, Bin

doi:https://doi.org/10.1155/2021/2952115

Computational Intelligence and Neuroscience

On this page

Abstract Introduction Preliminaries Conclusion Data Availability Conflicts of Interest References Copyright Related Articles

Research Article | Open Access

Volume 2021 | Article ID 2952115 | https://doi.org/10.1155/2021/2952115

Adaptive Critic Learning-Based Robust Control of Systems with Uncertain Dynamics

Jun Zhao,¹Qingliang Zeng,¹and Bin Guo²

Academic Editor: Maciej Lawrynczuk

Received19 Sept 2021

Revised12 Oct 2021

Accepted20 Oct 2021

Published16 Nov 2021

Abstract

Model uncertainties are usually unavoidable in the control systems, which are caused by imperfect system modeling, disturbances, and nonsmooth dynamics. This paper presents a novel method to address the robust control problem for uncertain systems. The original robust control problem of the uncertain system is first transformed into an optimal control of nominal system via selecting the appropriate cost function. Then, we develop an adaptive critic leaning algorithm to learn online the optimal control solution, where only the critic neural network (NN) is used, and the actor NN widely used in the existing methods is removed. Finally, the feasibility analysis of the control algorithm is given in the paper. Simulation results are given to show the availability of the presented control method.

1. Introduction

The basis of intelligent optimization decision-using adaptive dynamic programming (ADP) method is the optimal control design. There are many mature methods for optimal regulation control design of linear systems in the field of control theory and control engineering. However, for general nonlinear systems, Hamilton–Jacobi–Bellman (HJB) equation is yielded. The analytical solution of HJB equation is not easy since it is inherently a partial differential equation. Recently, the optimal control design of systems has attracted extensive attention. Among them, the successive approximation methods [1–3] overcome this difficulty via finding the approximate solution of HJB equation, which is closely related to the ADP method. ADP is a new method based on the idea of intelligent learning, which can provide effective optimal control solution for complex dynamic systems [4, 5]. In the past two decades, ADP has been widely used in solving adaptive optimal control problems of discrete-time and continuous-time systems [6, 7]. Now, data-driven control design has become a research hotspot in the field of control theory and control engineering [8, 9]. The ADP methods can promote the research of data-based decision-making and optimal control and is conducive to the development of artificial intelligence and computational intelligence technology.

Most of the existing results of ADP methods are obtained without considering the uncertainty of the controlled plant. However, the actual control system is always affected by model uncertainty, external disturbance, or other changes. We must consider these factors in the controller design to avoid the deterioration of the performance for the closed-loop system and improve the robustness of the controlled system. For robust control design, several alternative methodologies have been suggested in the control community. The work in [10] exploited the relationship between the robust control and the optimal control of nominal system subject to a specific value function. It indicates that one can design a robust control by solving an equivalent optimal control problem alternatively. Similarly, it was shown in [11] that the robust control design may be accomplished by addressing an H control problem. Nevertheless, online solving the derived optimal control equations was not discussed in [10]. Instead, they adopted offline schemes to seek for the solution of the derived optimal control equations. Recently, robust control design using the adaptive critic learning method has gradually become one of the research hotspots in the field of ADP, and many methods have been proposed [12–14]. These results fully show that the ADP method is suitable for robust control design of complex nonlinear systems in uncertain environment. Since many previous ADP literatures do not focus on the robust performance of the controller, the emergence of robust adaptive critic control greatly expands the application scope of the ADP method. Generally, the controller based on robust ADP can not only stabilize the original uncertain system but also make the system optimal without dynamic uncertainty. Thus, adaptive critic learning-based robust control includes the discussion of system stability, convergence, optimality, and robustness. It plays an important role in the field of intelligent learning control of complex systems in uncertain environment.

Based on the above facts, we develop an adaptive critic learning algorithm to resolve the robust control problem of uncertain systems. To this end, we construct an equivalence between the robust control problem and the optimal control problem via selecting the appropriate cost function; then, a single critic NN is used to reformulate the cost function. To realize the optimal control solution, we design an adaptive critic leaning algorithm; since it has strong convergence, the actor NN widely used in existing ADP results is removed. Then, the feasibility analysis of the control algorithm is also given in the paper. Simulations are given to indicate the validity of the developed method.

The major contributions of this paper include(1).To address the robust control problem, we transform the robust control problem of uncertain systems into an optimal control problem of the nominal system. It provides a new approach to address the robust control problem.(2).Different to [13], the uncertainty in the input matrix is considered in this paper, and then, the proposed control method is used in robotic systems. This helps to apply the proposed control algorithm to the practical industrial robotic systems in the future.(3).A novel designed adaptation algorithm driven by the NN weights’ errors is used to online learn the critic NN weights. Different to [15], the convergence of the estimated NN weights to the true values can be retained.

This paper is organized as follows. In Section 2, we introduce the robust control problem and transform the robust control problem into an optimal control problem. In Section 3, a single critic NN is used to reformulate the optimal cost function, and then, an adaptive critic learning method is proposed to address the derived optimal control problem. Section 4 gives some simulation results to illustrate the effectiveness of the proposed method. Some conclusions are stated in Section 5.

2. Preliminaries and Problem Formulation

A continuous-time (CT) uncertain system can be written aswhere and are the system state and the control action, respectively. with and are the nonlinear functions. and are the uncertainties. The purpose of this paper is designing a controller to make system (1) asymptotically stable under the uncertainties and . To this end, we give following assumptions.

Assumption 1. is the uncertainty in the input matrix. The uncertainty is bounded, i.e., there exists a nonnegative function such that .
To design a robust controller for the linear system, a linear matrix inequality (LMI) is proposed [16], while for nonlinear system (1), it is not easy. Inspired by [10, 12], an equivalence is built between the robust control problem of the uncertain system and the optimal control of the nominal system via selecting the appropriate cost function. Thus, we define the nominal system of the uncertain system (1) asFor system (2), a control action should be found to minimize the following cost function [17]:where and are the positive definite weight matrices. Hence, based on the optimal principle, we can obtain the Lyapunov function of the cost function (3) aswhere is the derivative of with respect to .
Therefore, we can get the optimal cost function asand its corresponding HJB equation can be given asBy solving (6), we have the optimal control action asThen, we will give the lemma to explain the robust control problem of system (1) which can be transformed into an optimal control problem of system (2) via constructing cost function (3).

Lemma 1 (see [11, 18]). Assume that the solution can be solved via optimal control problem of system (2) with cost function (3) and , and this solution can make uncertain system (1) asymptotically stable, which means that the optimal control solution is the solution of the robust control problem for system (1).

Proof. Because for and for given in (5), then we can consider is a Lyapunov function; based on (6) and (7), we haveAccording to the condition given in Lemma 1, i.e., , we obtaini.e., , and , and the uncertain system (1) is asymptotically stable for any uncertainties and . According to the above facts, the optimal solution is the robust control solution of the uncertain system (1). This completes the proof.
From Lemma 1, we have that if we select , where is the identity matrix with , then holds.

Remark 1. Lemma 1 shows that the robust control problem of the original uncertain system can be equivalent to the optimal control problem of the nominal system, and then, the solution of the robust control problem can be obtained indirectly by solving the optimal control problem. Therefore, this equivalence relationship can be used to develop a new robust control design method and solve it by using ADP method, as described in the following section.

Remark 2. It is well-known that control belongs to robust control. Although many control design techniques have been proposed, it should be noted that, as explained in Section 8.5 in [18], the control differs in the optimal method proposed in this paper. In the optimal control method, we start from the uncertainty bounds and then design the controller according to these bounds. Hence, if the controller exists, we can say the uncertain system is robustly stable.

3. Solving the Robust Control Problem via Adaptive Critic Learning

To obtain the optimal control solution (7), the unknown cost function (5) should be resolved. However, it is quite difficult to address the cost function (5) directly; then, a critic NN in this section will be proposed to approximate the cost function (5); this allows to develop an adaptive learning method to update online the NN weights, where the convergence of NN weights can be retained. Because its strong convergence, the actor NN widely used in the ADP schemes is removed. The proposed control system structure is given in Figure 1.

This section will propose an adaptive critic learning method to obtain the solution of the derived optimal control problem. To this end, a critic NN is trained to estimate the cost function , where the cost function is considered as continuous; hence, we have the following NN [13],where is the ideal critic NN weight, is the regressor vector, l is the number of neurons, and is the approximate error of NN.

Then, we have the partial derivative of cost function aswhere is the regressor matrix and is the NN error.

Without loss of generality, the following assumption is given in [13].

Assumption 2. The NN weight W, the regressor vector , the regressor matrix , and the approximate errors and are all bounded, i.e., , and for positive constants , and .
In fact, the ideal NN weight W is unknown; hence, we have that the practice can be online updated; then, the actual cost function can be written asHence, the practical estimated optimal solution can be solved asAccording to (10) and (11), we have the ideal optimal control (7) asand its practical optimal control is given asThe problem next to be solved is solving the unknown weight , which can guarantee the stability of the controlled system and realize the convergence to the ideal value W. Most existing ADP methods can only get the uniform ultimate boundedness (UUB) of the approximated NN weight rather than the convergence. In this paper, a novel adaptive critic learning method is introduced to guarantee the convergence of to W. This strong convergence property is conducive to avoiding the use of actor NN, and the optimal control approximated via critic NN can converge to its ideal optimal solution.
Substituting (11) into (4), we can rewrite the HJB equation aswhere is the residual error determined by the approximation error .
For developing an adaptive critic learning law to estimate the critic NN weight W, the known terms in (16) can be defined asThen, the HJB equation (16) with (17) can be given asAccording to (18), only the NN weight W is unknown in this parameterized formulation. Hence, it can be estimated by using a recently proposed learning algorithm [19, 20], which is driven by the derived estimation error.
To this end, the filtered regressor matrices and can be denoted as [19, 20]where is a positive parameter. Hence, its solution can be derived aswhich can be online calculated based on the system state x.
With P and Q in (20), an auxiliary vector can be defined asFrom (18) and (20), we have with A bounded variable, e.g., , for a positive constant, . Then, we can obtain from (19)–(21) thatwith being the NN weight estimation error.
The estimation error used in the adaptive learning algorithm can help to guarantee the convergence of the estimate, as shown in [13]. Hence, we can design the following adaptive law to online calculate aswith being the adaptive learning gain.

Remark 3. The adaptive law (23) is driven by the estimation error . The purpose of this new learning algorithm is to guarantee the convergence of estimate to unknown weight W. Therefore, the learning algorithm given in this paper is different from those used in the existing ADP methods, e.g., [3, 21], which employ the gradient-based methods [22] to guarantee the boundedness of only.
To illustrate the convergence of the proposed learning algorithm, the positive definiteness of the matrix P defined in (20) will be introduced:

Lemma 2. When the regressor in (18) fulfills the persistent excitation (PE) condition, the matrix P defined in (20) is positive.
The convergence of the proposed learning algorithm can be summarized as follows.

Theorem 1. For the adopted critic NN with adaptive law (23), if the regressor vector in (18) satisfies the PE condition, the critic NN weight error exponentially converges to a small bounded set around zero.

Proof. For Lemma 2, the matrix P is positive definite when the regressor satisfies the PE condition, i.e., the minimum eigenvalue . Hence, a Lyapunov function can be chosen as , and its derivative along with (23) can be derived aswhich further impliesThus, we have that the estimation errors of NN weight will converge to a compact set : , in which the size of this set depends on the approximation error and the excitation level , i.e., for an arbitrarily small NN approximation error (according to the NN approximation property, this error can be arbitrarily small for sufficient NN nodes, i.e., with ). Therefore, can converge to . In the ideal case, i.e., and , then we know the estimation errors of weights converge to zero exponentially.

For system (2) with practical optimal control (15) and adaptive law (23), if the regressor satisfies the PE condition, the error converges to a small set around zero. Moreover, the actual optimal control u in (15) converges to a region around its optimal solution in (14), i.e., . Hence, the original robust control problem is resolved.

4. Simulation

4.1. Numerical Simulation

Consider an uncertain system aswhere , , is the system state, is the control input, and the term with and with denote the uncertainties.

Because the uncertain terms and are bounded by , then we can obtain the optimal control problem aswith the cost function as

As given in [18], the optimal cost function is written as

Then, we can obtain its optimal solution as

A critic NN will be used to approximate the cost function ; thus, the activation function is defined as

To realize the simulation, we set the learning parameters , the initial system state is given as , the initial weight , and the weight matrices are set as .

Figure 2 shows the estimated value of the evaluation NN weights. It can be seen from Figure 2 that the estimated NN weight converges to a certain value. This result verifies the convergence of Theorem 1 and the effectiveness of the proposed learning algorithm, which indicates that estimated critic NN weight converges to its true value, i.e., . To better display the performance of the proposed learning method, the error between the ideal cost function and practical coat function is given in Figure 3, where we can obtain the fairly satisfactory approximation performance. In fact, the simulation results in Figures 2 and 3 can be also found in [13]; different from [13], this paper considers the uncertainties involved in control input. Figure 4 shows the change of the state of the controlled system under the derived optimal control, which shows that the closed-loop system is asymptotically stable. The corresponding control input is shown in Figure 5, bounded and smooth.

4.2. Application to Robotic Systems

This section will develop a simulation based on a 2-DOF robot [18, 23]. To realize the simulation, the robotic systems model can be defined aswhere is the joint variables, is the generalized forces, denotes the inertia matrix, is the centripetal vector, is the friction vector, and defines the gravity vector. In this section, we denote . There are uncertainties in and due to the unknown load on the manipulator and unmodeled frictions.

The inertia matrix can be derived aswhere , , and .

The centripetal vector iswhere and .

The friction vector and gravity matrix arewhere and .

Some model parameters are given as , , , , , , and . With above system dynamics, the state equation of the system can be given as [18]where , , , is the value of when the load is . , and is the value of when the load .

In this simulation, we set the initial weight ; when the load , we choose the leaning parameters and and weight matrices and . The initial states are , , , and .

Figure 6 shows the estimated critic NN weights. It can be seen from Figure 6 that the estimated NN weight converges to certain value. This result verifies the convergence of Theorem 1 and the effectiveness of the proposed learning algorithm. Figure 7 shows the change of the controlled system state under the derived optimal control when the load condition is set , which shows that the closed-loop system is asymptotically stable. The corresponding control input is shown in Figure 8. Although it jitters at first, it tends to be smooth when it stabilizes.

From above simulation results, we have that the proposed learning method and control technique are effective.

5. Conclusion

The purpose of this paper is to address the robust control problem of the uncertain systems via developing an adaptive critic learning method. To this end, the robust control problem of the uncertain systems is transformed into an optimal control problem of the nominal systems via selecting the cost function. Then, a single NN is used to reformulate the cost function, where the unknown cost function can be represented as a known term; then, an adaptive critic learning method based on the adaptive parameter estimation technique is presented to obtain the optimal cost function such that the optimal control problem can be solved. Simulations are given to show the effectiveness of the proposed leaning algorithm and control method. Future work will focus on the robust tracking control with uncertain systems.

Data Availability

Data were curated by the authors and are available upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

References

G. Saridis and F. Wang, “Suboptimal control of nonlinear stochastic systems,” Control Theory and Advanced Technology, vol. 10, no. 4, pp. 847–871, 1994.
View at: Google Scholar
R. W. Beard, G. N. Saridis, and J. T. Wen, “Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation,” Automatica, vol. 33, no. 12, pp. 2159–2177, 1997.
View at: Publisher Site | Google Scholar
M. Abu-Khalaf and F. L. Lewis, “Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach,” Automatica, vol. 41, no. 5, pp. 779–791, 2005.
View at: Publisher Site | Google Scholar
P. J. Werbos, “Advanced forecasting methods for global crisis warning and models of intelligence,” General Systems Yearbook, vol. 22, no. 6, pp. 25–38, 1977.
View at: Google Scholar
P. J. Werbos, “Approximate dynamic programming for real-time control and neural modeling,” Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold, New York, NY, USA, 1992.
View at: Google Scholar
D. Wang, D. Liu, Q. Wei, D. Zhao, and N. Jin, “Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming,” Automatica, vol. 48, no. 8, pp. 1825–1832, 2012.
View at: Publisher Site | Google Scholar
B. Xu, C. Yang, and Z. Shi, “Reinforcement learning output-feedback NN control using deterministic learning technique,” IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 3, pp. 635–641, 2014.
View at: Publisher Site | Google Scholar
F. Wang, “Parallel control: a method for data-driven and computational control,” Acta Automatica Sinica, vol. 39, no. 4, pp. 293–302, 2013.
View at: Google Scholar
Z.-S. Hou and Z. Wang, “From model-based control to data-driven control: survey, classification and perspective,” Information Sciences, vol. 235, pp. 3–35, 2013.
View at: Publisher Site | Google Scholar
F. Lin, R. D. Brandt, and J. Sun, “Robust control of nonlinear systems: compensating for uncertainty,” International Journal of Control, vol. 56, no. 6, pp. 1453–1459, 1992.
View at: Publisher Site | Google Scholar
K. Zhou and J. Doyle, Essentials of Robust Control, vol. 38, Prentice-Hall, Hoboken, NJ, USA, 1998.
D. Wang, D. Liu, C. Mu, and Y. Zhang, “Neural network learning and robust stabilization of nonlinear systems with dynamic uncertainties,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 4, pp. 1342–1351, 2018.
View at: Publisher Site | Google Scholar
J. Zhao, J. Na, and G. Gao, “Adaptive dynamic programming based robust control of nonlinear systems with unmatched uncertainties,” Neurocomputing, vol. 395, pp. 56–65, 2020.
View at: Publisher Site | Google Scholar
X. Yang, H. He, and X. Zhong, “Adaptive dynamic programming for robust regulation and its application to power systems,” IEEE Transactions on Industrial Electronics, vol. 65, no. 7, pp. 5722–5732, 2018.
View at: Publisher Site | Google Scholar
C. Yang, C. Chen, W. He, R. Cui, Z. Li, and M. Wang, “Biologically inspired motion modeling and neural control for robot learning from demonstrations,” IEEE Transactions on Neural Networks and Learning Systems, vol. 30, no. 3, pp. 777–787, 2019.
View at: Publisher Site | Google Scholar
P. Gahinet, A. Nemirovskii, A. J. Laub, and M. Chilali, “The LMI control toolbox,” in Proceedings of the 1994 33rd IEEE Conference on Decision and Control, vol. 3, pp. 2038–2041, Lake Buena Vista, FL, USA, December 1994.
View at: Google Scholar
Y. Yang, K. G. Vamvoudakis, H. Modares, Y. Yin, and D. C. Wunsch, “Hamiltonian-driven hybrid adaptive dynamic programming,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 10, pp. 6423–6434, 2021.
View at: Publisher Site | Google Scholar
F. Lin, Robust Control Design: An Optimal Control Approach, John Wiley & Sons, Hoboken,, NJ, USA, 2007.
J. Na, B. Wang, G. Li, S. Zhan, and W. He, “Nonlinear constrained optimal control of wave energy converters with adaptive dynamic programming,” IEEE Transactions on Industrial Electronics, vol. 66, no. 10, pp. 7904–7915, 2019.
View at: Publisher Site | Google Scholar
Y. Lv, J. Na, Q. Yang, X. Wu, and Y. Guo, “Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics,” International Journal of Control, vol. 89, no. 1, pp. 99–112, 2016.
View at: Publisher Site | Google Scholar
K. G. Vamvoudakis and F. L. Lewis, “Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem,” Automatica, vol. 46, no. 5, pp. 878–888, 2010.
View at: Publisher Site | Google Scholar
F. Ding, X. Liu, and J. Chu, “Gradient-based and least-squares-based iterative algorithms for hammerstein systems using the hierarchical identification principle,” IET Control Theory & Applications, vol. 7, no. 2, pp. 176–184, 2013.
View at: Publisher Site | Google Scholar
C. Zhang, J. Na, J. Wu, Q. Chen, and Y. Huang, “Proportional-integral approximation-free control of robotic systems with unknown dynamics,” IEEE, vol. 26, no. 4, pp. 2226–2236, 2021.
View at: Publisher Site | Google Scholar

Copyright

Copyright © 2021 Jun Zhao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies