An Algorithm for Solving a Class of Multiplayer Feedback-Nash Differential Games

Herrera de la Cruz, Jorge; Ivorra, Benjamin; Ramos, Ángel M.

doi:https://doi.org/10.1155/2019/1417275

Mathematical Problems in Engineering

On this page

Abstract Introduction Materials and Methods Conclusions Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2019 | Article ID 1417275 | https://doi.org/10.1155/2019/1417275

An Algorithm for Solving a Class of Multiplayer Feedback-Nash Differential Games

Jorge Herrera de la Cruz,¹Benjamin Ivorra,²and Ángel M. Ramos²

Academic Editor: Mauro Gaggero

Received29 Jan 2019

Accepted20 May 2019

Published28 May 2019

Abstract

In this work, we introduce a novel numerical algorithm, called RaBVItG (Radial Basis Value Iteration Game) to approximate feedback-Nash equilibria for deterministic differential games. More precisely, RaBVItG is an algorithm based on value iteration schemes in a meshfree context. It is used to approximate optimal feedback Nash policies for multiplayer, trying to tackle the dimensionality that involves, in general, this type of problems. Moreover, RaBVItG also implements a game iteration structure that computes the game equilibrium at every value iteration step, in order to increase the accuracy of the solutions. Finally, with the purpose of validating our method, we apply this algorithm to a set of benchmark problems and compare the obtained results with the ones returned by another algorithm found in the literature. When comparing the numerical solutions, we observe that our algorithm is less computationally expensive and, in general, reports lower errors.

1. Introduction

From a general point of view, differential games (DG) is a topic involving game theory and controlled systems of differential equations. Here, we focus on conflict problems in which players control a system with an associated cost function per player. There exist two main types of control systems for these problems: closed-loop (or feedback), in which the controls (optimal solutions) are functions of the state variables of the system; and open-loop solutions, in which controls are only functions of time. Focusing on differential games, feedback controls are more robust than open-loop controls due to the possibility of the players to react to sudden changes in the values of the state variables of their opponents while they are playing the game. Based on the behavior of the players, it is possible to classify the game and its corresponding equilibrium solutions as cooperative and noncooperative. The first class is used when the players make coordinated decisions trying to optimize a joint criterium, while the second class is used when the players compete each other trying to optimize their own criterium, taking into account the strategies from the rest players. We note that there also exist intermediate possibilities.

In this work, we focus on Nash (i.e., noncooperative) feedback solutions (i.e., closed-loop). Our aim is twofold. Firstly, we develop an algorithm to solve numerically a deterministic multiplayer feedback-Nash differential game (FNDG). The algorithm is called RaBVIt (Radial Basis Value Iteration Game) and is based on radial basis, value iteration, and game iteration. Secondly, we apply the algorithm to some benchmark problems found in the literature of differential games (generally, linear or linear quadratic examples with explicit solutions) in order to compare the performance of RaBVItG to a previous algorithm (see [1], for more details).

Focusing on recent works, we highlight the fact that, in a recent survey (see, [2]), the authors report that some existing numerical (computational) methods in the literature of DG are focused in some feasible subclass of games: two-person zero-sum games (see, [3]). These games do not strictly match our purpose (of dealing with players nonzero games) but are interesting, from an analytical point of view, because they tackle the lack of differentiability of the value function by using the viscosity solution, as this scheme can be reduced to a one-dimensional game. Viscosity solutions in dimensional games are, in general, complex to obtain. Additionally, other advances are in the line of linear quadratic models (see, [4]), but this framework omits to deal with interesting nonlinearities in the dynamic system or running cost functions. According to the authors, computational differential games is an area that needs to grow up.

In this context, RaBVItG is an algorithm based on some recent reinforcement learning schemes (see, e.g., [5]) which are discrete-time methods closely related to dynamic programming. More precisely, we consider methods from reinforcement learning algorithms that are, mainly, used to simulate and approximate the behavior (in their usual terminology) of a set of agents that take actions in order to get a cumulative reward. In particular, we focus on value iteration (VI) which is a general technique used by reinforcement learning to iterate over the value function of the problem in order to obtain a fixed point solution. In our case, we have a coupled system of value functions (one per player) involved in a FNDG. We also use in RaBVItG the concept of game iteration (GI), meaning that, at each value iteration step, we iterate again to find the corresponding game equilibrium (Nash, in this case) associated with the set of value functions. This algorithm is used to simulate the way in which the players make decisions (for instance, in a Nash context, fixing the opponent’s strategies to obtain the optimal strategy for a player). Finally, we also use function approximation techniques that allow us to simplify the model in order to approximate, using mesh-free techniques, the value function for each player (see, for instance, [6, 7]).

This numerical method has been designed for solving a coupled system of -player Hamilton-Jacobi-Bellman equations (HJB). HJB equations provide the value function which gives optimal policies for a given dynamic system with an associated cost function (see, for example, [8]). They are the key for finding a feedback solution. In order to discretize our problem (in time and space), we use the techniques developed in [9], which introduces a semi-Lagrangian discretization framework for HJB equations and proves the convergence of the scheme based on the viscosity solution (see, e.g., [10]).

In order to validate our algorithm, we apply it to solve a set of benchmark problems found in the literature (see, [1, 4]). We compare the obtained CPU time and error with the ones returned by another numerical algorithm found in [1].

This paper is organized as follows. In Section 2, we present the theoretical model by describing the relevant variables involved in the game, the coupled optimization problems, and the basic Nash equilibrium concepts. In Section 3, we explain the numerical implementation of our method, based on a semi-Lagrangian discretization, value iteration, and radial basis interpolation. In Section 4, we show the performance of the method by solving some benchmark problems.

2. Materials and Methods

This section deals with the explanation of the considered deterministic theoretical model. Firstly, we introduce the differential game of interest. Secondly, we define the considered feedback Nash equilibrium and the Hamilton-Jacobi-Bellman equations.

2.1. Deterministic Differential Game

We first define the class of differential games we are dealing with in this work.

Let us consider a set of players. Each player, , has a payoff functional, , given bywith , , andandDefining the following, (1) is the control function with , so that . Let us define , as the control associated with the i- th player, with a given subset of admissible controls. We also denote by , , and .(2) is a function such that denotes the state of the system at time , where is the set of admissible states (. The evolution of those state variables is driven by (2), called the state equation. We note that and , with , and assume that is continuous and it exists such that , for all , , and . Those conditions ensure that the system has a unique solution (this can be proved by using the Carathèodory theorem, see [9]).(3) is the payoff functional of player . We assume , where represents the instantaneous payoff of player for the choice . Note that the integral is affected by an exponential discounting parameter that actualizes the value of the payoff (see, for instance, [4]). The presence of the discount factor ensures that integral (1) is finite whenever is bounded.

2.2. Feedback Optimization Problem

In this Section, we recall the synthesis procedure detailed in, e.g., [11].

Let be a continuously differentiable mapping called value function and defined by

For each , we assume the existence of, at least, one optimal control such thatwhere is an admissible trajectory satisfying (2)-(3). We denote by the optimal trajectory.

According to [11], the function given byis constant and nonincreasing for all if and only if is an optimal control for the initial condition Thus, considering , and derivating with respect to , we obtain

Furthermore, for , we have that

We note that, for any other admissible constant control of the form , for all , as is not increasing, we have that Again, for , we obtain So, under previous hypothesis, we conclude that for all we satisfy the following HJB equation:

and is such that

We remark that, in general, if is only continuous, such equation needs to be interpreted in terms of viscosity solution of Problem (11). However, when it applies to differential games, there is, so far, no general theorem on the existence or uniqueness of solutions (see [12]).

Now, we define called feedback-map per player , such that

where We consider an optimal control defined by

for almost every

The abovementioned synthesis procedure consists in obtaining, using the feedback-map, an optimal decision related to the corresponding optimal trajectory, by solving

and

So, provided an initial position , we say that is an optimal pair control-trajectory for every initial condition, and it corresponds to the optimal feedback policy we are trying to estimate.

2.3. Feedback-Nash Equilibrium

Now, we adapt the previous expressions in order to get a feedback-Nash equilibrium.

To do so, we define a feedback-Nash map , per player , such that, for all ,where the array denotes the pair of the control for player and the controls associated with the rest of players (denoted by .

Considering this feedback-Nash map, we apply the synthesis procedure described previously. To do so, we define a feedback N-tuple and a feedback (N-1)-tuple . Then, is a feedback-Nash equilibrium (FNE) ifwhere denotes and is the vector of controls obtained by replacing the th component in by .

Assuming and are fixed, finding by maximizing (i.e., solving (18)) can be solved by using dynamic programming (see, for instance, [7, 13]).

Finally, we definewhich is the solution of the following HJB equation (see [3]):

where the notation means that we are finding a Nash equilibrium for player , by fixing the optimal strategies of the players. We note that, regarding current literature, apart from zero-sum games, to find relevant cases where (20) is well posed, we need to focus on games in one spatial dimension. For instance, in [14], an existence theorem of Nash equilibria in feedback form, valid for one-space dimension noncooperative games, is given. However, as far as we know, there are no general existence theorems for feedback-Nash equilibrium (for spaces in dimension greater than 1).

3. The RaBVItG Algorithm for Solving Deterministic Differential Games

In this section, we describe the numerical implementation of the algorithm RaBVItG used to solve the considered deterministic differential game presented in Section 2.2. To do so, we propose a semi-Lagrangian discretization scheme of the HJB equation (see [9]). Then, we describe the general structure of the algorithm used to solve this problem. Finally, we introduce a particular implementation for the case of a feedback-Nash -players differential game.

3.1. Model Discretization

Here, we propose a particular discrete version of (1)-(3).

Let be a time step and , . First, we aim to approximate . To do so, given , we consider the following discrete approximation of (1)-(2):where , with , being and with the assumption that the controls are constant on ; , with the control for player , and , being with .

Then, starting from , we use a first-order Euler scheme for the state equations:

Next, we discretize the HJB equation (20). To this aim, given , we approximate (19) by considering

Following [9], we obtain a first-order discrete-time HJB equation for player (i.e., a discrete version of (20)) of the form:and we point out that corresponds to the first-order Taylor expansion of which is typically used as a discounting factor in discrete dynamic programming (see, i.e., [15]).

Now, focusing on the synthesis procedure, we define the following discrete-time versions of discrete feedback N-tuple as and the discrete feedback (N-1)-tuple as. Note that the discrete feedback-Nash satisfiesfor each and .

Furthermore, according to the definition of the feedback-map,

Thus, we obtainwhere for and

However, to determine , satisfying (24) is still not always feasible. Thus, we aim to obtain an approximation, denoted by , by considering a spatial discretization of (20). To do so, we consider a set of arbitrary points , with being a closed subset of . Next, we approximate for each and . To this aim, let (which are not necessarily in ). For those points, the HJB equation (24) can be approximated aswhere is the approximation of at points computed by a collocation algorithm (see, for instance, [6]) using a “mesh-free” method based on scattered nodes.

More precisely, is of the formwhere , is the Euclidean norm and is a real-valued radial basis function (see, for instance, [16]). Here, we use the Gaussian RBF given by with . In order to determine , for , we consider thatwhere , for , and for

3.2. Algorithm Structure

In this section, we present the general structure of the algorithm used to solve the problem defined by (1)-(3) and (18) and using the discrete HJB equation (29). This algorithm is based on two main nested loops. It combines a process of the main loop called n value iteration (see [13]) with an inner loop called game iteration (GI), consisting in a relaxation algorithm to find the proper (convergent) Nash equilibrium for an approximated value (until reaching convergence of this value).

Firstly, before presenting the algorithm, we introduce some useful notations. Let be the array of values for all the players evaluated in all the original set of points and given by We also define as a matrix that stores the controls of each player as follows: where and denotes the set of all real-valued matrices of dimension . We also define Additionally, let be a vector that quantifies the cost for each player at every data point, given byNext, we introduce two operators, and , withand In the previous expression, is a interpolation block vector, where and are defined in (31).

Secondly, we pretend to solve the following fixed-point schemes:where denotes spatiotemporal discretization parameters. More precisely, let and ,. We define the following processes: (i) Game iteration: first, for each player i, we generate , a candidate to optimal policy at step s+1 as follows: with , a weight coefficient (see, e.g., [17]), and described below. We iterate this process until reaching convergence (i.e., considering , with Once the convergence is reached, we consider the following candidate for the feedback-Nash: Doing so, we obtain a true (in the sense of convergence) Nash equilibrium for a false value function (until our value function also converges).(ii) Value iteration: once we have obtained a candidate for the Nash equilibrium (i.e., the previous game iteration loop ended), we update the value function at step r+1 as follows:

From a general point of view, our numerical scheme consists in a coupled system of the form , . It can be summarized as

Remark 1. As done in Section 4, we compare our algorithm with a competitive one found in the literature (see [1]), and we briefly recall the former algorithm main structure:The main difference between both methods is the existence of the step called game iteration.

3.3. Algorithm Pseudocode

Now, we present in detail a pseudocode version of our algorithm presented in the previous section.(i)Initialize all parameters and counters:(a) Set: the number of players; the time step; the discount parameter; ; the update parameter for the relaxation algorithm in game iteration. Fix and , the maximum number of iterations for the two while loops defined below.(b)Initialize: the value iteration step; the game iteration step; the tolerance value for the stopping criteria: and , for Value Iteration and Game Iteration, respectively.(c)Define bounds for controls and state variables according to the experimental data.(ii)Build an initial approximation:(a)Generate a set of random scattered points, with a uniform distribution, for each player from :(b)Generate a set of convenient initial values and controls: If there is no a priori information, they are set to zero. (iii)We define , using radial basis functions (see (30))(a)Determine (for by collocation (see (31)) satisfying:

WHILE () and ()

WHILE () and ()(iv)For , perform:(a)Set (b)Set(c)Apply the game iteration process, for each player :(1)Obtain a state variable: .(2)Get , ,(3)Compute (4)Set(d)Compute (1)Set

END WHILE(v)Define (vi)Set .(vii)Get , ,(viii)Approximate value function: (ix)Actualize the value function (Value Iteration process): (x)Actualize by collocation(xi)Compute (xii)(xiii)

END WHILE (xiv)In cases for which the algorithm converges, we return the equilibrium policies and their associated optimal values: , both depending on (xv)By using RBF, we approximate the obtained feedback optimal controls to all values of the state variables which are in the bounded state space defined during the initialization step.

Remark 2. We note that, in the context of the FNDG problems tackled in this work, it is not a trivial task to prove the convergence of the considered semi-Lagrangian discretization to the viscosity solution of the HJB equations (which can be seen as a weak solution). As shown in [11], the definition of viscosity solution is mainly based on comparison principles with upper and lower solutions. However, those techniques are not adapted to the case of vector-valued functions contemplated here (see, e.g., [18]). Additionally, there exist some results in the literature for systems of Hamilton-Jacobi equations (see, for instance, [19]) studying the limits of vanishing viscosity solutions. In practice, it is a quite complex task to prove the existence and uniqueness of those limits. Therefore, in our case, the analysis of the convergence of RaBVItG is only studied through the convergence of the considered numerical experiments. A theoretical analysis of the convergence properties of the proposed method should be performed as a future work.

4. Numerical Experiments

In Sections 1 and 2, we have presented and developed the RaBVItG algorithm to solve differential games involving players pursuing a feedback-Nash. Now, in the current section, we apply the algorithm to several benchmark problems in order to evaluate its main properties.

Remark 3. In a recent survey (see [2]), the existing numerical methods in the literature focus on some feasible subclass of games, regarding the ones we are dealing with in this work: two-person zero-sum games (see [3]) which can deal with the lack of differentiability of the value function by using the viscosity solution since this game can be reduced to a one-dimensional game. Additionally, the other advances are in line with linear quadratic models (see [4]), avoiding the interesting nonlinearity in the dynamic system or running cost. According to the authors, computational differential games is an area that needs to grow up.

Here, we solve different test problems, selected from [1, 4].

In order to check the efficiency of our approach, we compute (i) the error of the method with respect to the analytic solution; and (ii) the CPU time and the number of iterations. Furthermore, we also compare our algorithm performances by comparing its results to the one returned by the method proposed in [1], which is called in this work CCF (i.e., from the initials of the authors). To our knowledge, at this moment, it is the only algorithm found in the literature designed to solve similar problems. Both algorithms are based on the semi-Lagrangian discretization of the problem and the design of value iterations as the main loop. However, they exhibit several relevant differences, such as the following:(i)RaBVItG is a meshfree algorithm and, thus,(1)only evaluated at some points of the state space,(2)uses a RBF method to approximate functions,(3)does not require a discretization of the control space, so that we use a gradient algorithm to find the critical points at each algorithm step.(ii)RaBVItG incorporates an iteration inner loop, called game iteration, where players alternately update their strategies by averaging their current strategy with the best response to the other player’s current strategy. The main advantage is that it finds the proper Nash equilibrium for an approximated value until reaching convergence in the value iteration loop. Our hypothesis is that this step increases the accuracy and, thus, reduces the approximation error.

4.1. Numerical Test 1 (LQ, Scalar)

This example is introduced in [4]. In this case, we consider a two-player () scalar ) differential game, where Let the following quadratic cost functional to be minimized, with subject to the following linear dynamic system:

According to [4], the analytic feedback-Nash equilibria for this game are defined by the following linear expressions: where are the equilibria for the following system of Ricatti ordinary differential equations, derived from the solution of the corresponding HJB equations: Furthermore, an additional condition to be a stable equilibrium is to satisfy the following inequality:where (56) divides the plane into “stable” and “unstable” regions. So, all the feedback-Nash equilibria are obtained as the intersection points of both hyperbolas in the “stable” region.

Now, we consider the following particular parameter values: (see [4], p. 396). Equation (55) becomesWe obtain those equilibria values: However, according to Theorem 8.22 in [4], since we have here three feedback-Nash equilibria, then two of them are stable nodes and one is a saddle point. We note that the pair does not satisfy condition (56). In Figure 1, we show the phase diagram with the three stable equilibria, where and are stable nodes and is a saddle point. Finally, is unstable.

Next, we analyze the performance of our algorithm to solve this problem and compare it with CCF, mentioned previously. To do so, we choose three different starting points which are in the basin of attraction of the different stable feedback-Nash equlibria. As it can be seen in Figure 1, equilibria have a symmetric behavior with respect to and, so, the results will remain the same. Thus, we only analyze two of the three equilibria. In order to analyze the performance of both algorithms, we run them three times starting from the following initial points: the corresponding equilibrium and points in the attraction domain of each equilibrium (see Figure 1).

We define our state variable as and we discretize it by using points, so . Regarding the control space, it is bounded by considering values in . RaBVItG does not need any grid in the control space. In the case of CCF, we have generated a uniform grid of 20 elements from the defined space.

In Table 1, we summarize the results obtained with both algorithms (i.e., RaBVItG and CCF) and their performances to reach the two equilibria of interest. For the three different starting equilibria for each player, and for each starting point, we report the errors values: where is the vector of solutions (for obtained by the algorithms trying to achieve the equilibrium , and where are the numerical solutions obtained for the equilibrium , starting from the different initial values.

As it can be seen in Table 1, both algorithms show a similar qualitative pattern: the smaller the value of , the smaller the error and the higher the number of iterations needed. It is interesting to remark that our algorithm, RaBVitG, obtains lower errors but requires, in general, more iterations than CCF. However, as we point out in the next experiments, the computational time of RaBVitG is dramatically lower than the one of the CCF algorithm. This tends to show that RaBVitG seems to be more efficient and accurate than CCF.

On the other hand, Figure 2 shows how both algorithms converge to different equilibrium, depending on the starting point. To perform this experiment, we have run both algorithms using a mesh of initial conditions from the set . As we can see in Figure 1, for each plot, the white region represents the initial points that converge to the corresponding equilibrium (i.e., they satisfy the tolerance criterium of convergence with respect to the point represented by a circle). Additionally, the grey region is composed by the rest of initial points that do not satisfy this tolerance criterium. Thus, the white region can be interpreted as a numerical approximation of the basin of attraction of each equilibrium.

Comparing both algorithms, RaBVItG has less problems than CCF to identify the correct equilibrium in the area . Focusing on the saddle equilibrium, the stable manifold is situated on the bisector, so any trajectory starting out of this bisector should converge to the closest stable node (i.e., (-2,-1) or (-1,-2)). As it can be seen in Figure 2, CCF exhibits more wrong equilibria than RaBVItG. When starting from the attraction region of a stable equilibrium, for some initial points, both algorithms converge to the saddle point. However, RaBVItG performs better than CCF, since the convergence band around this equilibrium is finer.

4.2. Numerical Test 2 (Non-LQ, Scalar)

The next experiment was proposed in [1]. Again, we present a two-player scalar differential game, with the following cost functional to minimize, with :subject to the dynamic systemwith .

According to [1], we obtain the analytical value functions per player attained to their feedback-Nash equilibrium,and

Here, we consider , It is straightforward to deduce, once we know the true value function and using the corresponding HJB equations, that the optimal feedback-Nash policies remain constant for all values:

Focusing on the RaBVItG and CCF configurations, we use the following specifications: the admissible values for the state variable are in . We discretize it using equispaced points: . The set of admissible controls is in . Additionally, only in the case of CCF, we discretize the control variable using a grid of equispaced points. We choose in order to compare the effect of improving the control space discretization. We use an interpolator based on splines to approximate the value function in the corresponding algorithm step.

In Figure 3, we depict the different errors in infinity-norm between the numerical feedback-Nash policies and the real ones (65). As we can see in this figure, the greater is, the better the approximation to the analytical solution is. Indeed, in the case of , CCF produces slightly better results than RaBVItG. However, there is an important trade-off between improving the error measures and the number of points that CCF needs. Finally, reported in Table 2, our algorithm is clearly more time efficient and requires, approximately, the same quantity of iterations in value than CCF.

4.3. Numerical Test 3:(LQ, Scalar, Three Players)

We consider the following three-player minimization differential game, for :subject to the following dynamic system: with . Additionally, the considered parameter values are , , , , , , , , , and .

According to [4], this model has only one feedback-Nash equilibrium, which consists in

We set our admissible values for the state variable in and we discretize it by using points, so . Regarding the control space, we consider values in For the CCF algorithm, again we discretize the control variable using a grid of equally spaced points. As previously, we choose , and we use an interpolator based on splines.

As we can see in Table 3, RaBVItG again exhibits better computational times, showing the greatest efficiency for all the different time discretization values. Furthermore, in Figure 4, we observe that, even with , RaBVItG produces errors smaller than using CCF. The difference with respect to Test 2 is that the solution, in this case, is not a constant one, being a linear function.

4.4. Numerical Test 4: (Nonscalar, Two Players)

This last minimization differential game is interesting as is not a scalar one, because its state variable has two dimensions. The running-cost for issubject to the dynamic system

This test case was introduced in [1] but, unfortunately, the authors do not provide any analytical solution to this differential game in order to compare with. However, they show a figure with a value function comparable to the ones reported in Figure 5. We run CCF and RaBVItG using different mesh sizes (in the case of CCF, ) and meshfree points () in the case of RaBVItG. In Figure 5, we represent the different value functions for Player 1 (Player 2 has the same result since it is a symmetric game). The fact that CCF requires a mesh for the state variables (and controls) clearly affects the CPU time required to achieve convergence. Indeed, in order to compare the solution of RaBVItG with CCF algorithms, once we obtain the value function per player (as a dimensional array), we interpolate our solution into the mesh defined in CCF using, again, RBF methods. Both algorithms converge to the same solution as the mesh size increases. However, as shown in Table 4, CCF method needs, in general, less value iterations but is much computationally expensive than RaBVItG costly (around 10 times greater).

5. Conclusions

In this work, we have developed a novel numerical algorithm (RaBVItG) for solving multiplayer feedback-Nash differential games. RaBVItG is an algorithm based on radial basis function approximators (known to work properly without mesh), value iteration (a classical approach in dynamic programming), reinforcement learning (to solve a class of Hamilton-Jacobi-Bellman equations), and game iteration (to obtain, at each value iteration step, the corresponding feedback-Nash equilibrium).

The general purpose of this algorithm is to deal with multiplayer problems, with two or more players, and allow the dimension of the control space for each player to be greater than one. For this reason, we have designed an algorithm which is meshfree for both state and control spaces. Additionally, we have also selected the semi-Lagrangian discretization for the HJB equation due to its fine results generally reported in the literature for problems similar to the differential games studied in this paper.

We have validated RaBVItG by comparing the obtained results with the ones returned by another algorithm published in the literature (here, called CCF, see [1]). This particular algorithm is based on meshes in the state and control spaces and the way to get the Nash solution is relatively different from our approach. Indeed, CCF performs value iteration steps for a particular player fixing at every step the controls for the remaining players, until reaching convergence. In our case, RaBVItG alternates value iteration steps with a new step called game iteration. During this step, fixing the value for each player, the algorithm iterates until reaching convergence to a Nash equilibrium. So, we find the true (in the sense of convergence) Nash equilibrium for an approximated value (again, in the sense of convergence). As shown in our numerical experiments, our approach seems to exhibit a better accuracy when approximating the true equilibrium. Indeed, the different experiments performed in order to compare both algorithms tend to show that RaBVItG is, on average, 10 times faster than CCF and, in some of the experiments, with a smaller quantity of data points, RaBVitG obtains smaller errors.

Additionally, RaBVItG can be implemented in real problems with more than two players and more than two controls per player, which are problems difficult to solve with CCF.

Future lines of work on RaBVItG include the study of a stochastic version (e.g., including a multivariate diffusion term in the state equations) and different game iterations schemes in order to find other typical equilibria studied in the literature (e.g., Stackelberg, or Pareto). Furthermore, as discussed in Remark 2, a study of the theoretical convergence properties of the proposed algorithm should be performed.

Data Availability

All the parameters and data used to support the finding of this study are included within the article. Furthermore, the programs are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was carried out thanks to the financial support of the Spanish “Ministry of Economy and Competitiveness” under Project MTM2015-64865-P; the research group MOMAT (Ref. 910480) supported by “Banco Santander” and “Universidad Complutense de Madrid”; and the “Junta de Andalucía” and the European Regional Development Fund through Project P12-TIC301.

References

S. Cacace, E. Cristiani, and M. Falcone, “Numerical approximation of nash equilibria for a class of non-cooperative differential games,” in Game Theory and Applications, vol. 16, pp. 45–58, 2013.
View at: Google Scholar
S. Jørgensen and G. Zaccour, “Developments in differential game theory and numerical methods: economic and management applications,” Computational Management Science, vol. 4, no. 2, pp. 159–181, 2007.
View at: Publisher Site | Google Scholar | MathSciNet
M. Falcone, “Numerical methods for differential games based on partial differential equations,” International Game Theory Review, vol. 8, no. 2, pp. 231–272, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
J. Engwerda, LQ Dynamic Optimization and Differential Games, John Wiley & Sons, 2005.
W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality, John Wiley & Sons, Hoboken, NJ, USA, 2007.
View at: Publisher Site | MathSciNet
N. Mai-Duy and T. Tran-Cong, “Approximation of function and its derivatives using radial basis function networks,” Applied Mathematical Modelling, vol. 27, no. 3, pp. 197–220, 2003.
View at: Publisher Site | Google Scholar
L. Tesfatsion and K. Juddn, Eds., Handbook of Computational Economics, vol. 2, Elsevier, 2006.
E. J. Dockner, S. Jorgensen, N. V. Long, and G. Sorger, Differential Games in Economics and Management Science, Cambridge University Press, Cambridge, UK, 2000.
View at: Publisher Site | MathSciNet
M. Falcone and R. Ferretti, Semi-Lagrangian Approximation Schemes for Linear And Hamilton-Jacobi Equations, vol. 133, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, USA, 2014.
View at: MathSciNet
M. G. Crandall and P. Lions, “Viscosity solutions of Hamilton-Jacobi equations,” Transactions of the American Mathematical Society, vol. 277, no. 1, pp. 1–42, 1983.
View at: Publisher Site | Google Scholar | MathSciNet
M. Bardi and I. Capuzzo-Dolcetta, Optimal Control and Viscosity Solutions of Hamilton-Jacobi-Bellman Equations, Springer Science & Business Media, Boston, Mass, USA, 1997.
View at: Publisher Site | MathSciNet
A. Bressan, “Noncooperative differential games,” Milan Journal of Mathematics, vol. 79, no. 2, pp. 357–427, 2011.
View at: Publisher Site | Google Scholar | MathSciNet
L. Grüne and W. Semmler, “Using dynamic programming with adaptive grid scheme for optimal control problems in economics,” Journal of Economic Dynamics & Control, vol. 28, no. 12, pp. 2427–2456, 2004.
View at: Publisher Site | Google Scholar | MathSciNet
A. Bressan and W. Shen, “Small BV solutions of hyperbolic noncooperative differential games,” SIAM Journal on Control and Optimization, vol. 43, no. 1, pp. 194–215, 2004.
View at: Publisher Site | Google Scholar | MathSciNet
D. P. Bertsekas, Dynamic Programming and Optimal Control, vol. 1, Athena Scientific, Nashua, NH, USA, 3rd edition, 2005.
View at: MathSciNet
G. E. Fasshauer, Meshfree Approximation Methods with MATLAB, vol. 6, World Scientific, Singapore, 2007.
View at: Publisher Site | MathSciNet
J. B. Krawczyk and S. Uryasev, “Relaxation algorithms to find Nash equilibria with economic applications,” Environmental Modeling & Assessment, vol. 5, no. 1, pp. 63–73, 2000.
View at: Publisher Site | Google Scholar
A. Bressan and F. S. Priuli, “Infinite horizon noncooperative differential games,” Journal of Differential Equations, vol. 227, no. 1, pp. 230–257, 2006.
View at: Publisher Site | Google Scholar | MathSciNet
S. Bianchini and A. Bressan, “Vanishing viscosity solutions of nonlinear hyperbolic systems,” Annals of Mathematics: Second Series, vol. 161, no. 1, pp. 223–342, 2005.
View at: Publisher Site | Google Scholar | MathSciNet

Copyright

Copyright © 2019 Jorge Herrera de la Cruz et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies