Abstract
The main goal of this paper is to explore strategic decision-making of multiuser power control in wireless network from the perspective of mean field game theory. First, we formulate the multiuser power control problem as a -person noncooperative game and prove the existence of Nash equilibrium for multiuser power control game. Then, we design a new mean field gradient ascent with win or learn fast (WoLF-MFGA) algorithm by introducing mean field term into gradient ascent with the WoLF criterion algorithm. Subsequently, we investigate a sufficient condition of convergence for the proposed WoLF-MFGA algorithm. Furthermore, the WoLF-MFGA algorithm is used to achieve the Nash equilibrium of multiuser power control game, and some simulation results are given. Finally, for the sake of analyzing the convergence of the proposed algorithm, we investigate the sensitivity results of the WoLF-MFGA algorithm.
1. Introduction
One of the key issues in the wireless network system is multiuser power control. The main purpose of multiuser power control refers to force each user to transmit enough power so that they enable to achieve the required quality without causing unnecessary interference to other users. In recent years, the multiuser power control problem in many applications fields has attracted the focus attention of many scholars. MacKenzie and Wicker [1] showed that game theory is an appropriate tool to address a variety of problems in wireless network communications systems. Yu et al. [2] investigated multiuser power control problem in a frequency-selective interference channel, and an iterative water-filling algorithm is proposed in order to efficiently reach its Nash equilibria. Furthermore, the convergence of the iterative water-filling process showed that the Nash equilibrium is unique for two-player matrix game. Yamashita and Luo [3] formulated multiuser power control problem for digital subscriber lines as a nonlinear complementarity problem, which the problem makes it possible to use the Newton-type smoothing (NS) methods to efficiently compute a Nash equilibrium solution and obtained that the NS algorithm is much more robust to the presence of strong interference than the existing synchronous water-filling algorithm. In addition, Meshkati et al. [4] proposed a game-theoretic model for studying power control in multicarrier code-division multiple-access systems, and power control problem is modeled as a noncooperative game in which each user decides how much power to transmit over each carrier to maximize its own utility. Hao et al. [5] designed a joint channel allocation and power control optimal algorithm based on noncooperative game in order to reduce the wireless sensor network interference, balance the network energy consumption, and further proved the algorithm converges to Nash equilibria. Hence, we formulate the multiuser power control problem as a -person noncooperative game and prove the existence of its Nash equilibrium.
When dealing with a large number of players, the traditional analytical method may face some problems; however, mean field game theory provided an efficient method that helps analyze the behavior of a large number of players by encapsulating players’ behavior in mean field terms. The mean field game has been introduced independently by Lasry and Lions [6] and Huang et al. [7]. Kizilkale et al. [8] modeled the power market as a dynamic large population game with a large number of agents and applied the mean field method to study the limit behavior of a large population of agents. Furthermore, the efficient decentralized algorithm between suppliers and consumers is provided. Couillet et al. [9] proposed a game theoretical framework to model the behavior of electrical vehicle and hybrid electricity-oil vehicle owners in a Cournot market when the number of selfish players is large. Hanif et al. [10] formulated last level cache sharing problems in large-scale cloud networks as resource-sharing game with finite and infinite number of players by introducing a myopic mean field response and showed that its Nash equilibrium strategy converges to a mean field equilibrium, which means that the optimal strategies of resource-sharing problem converge to the optimal price of the mean field game. Wu et al. [11] considered uplink power control in wireless communication when a large number of users compete over the channel resources and investigated the performance of mean field transmission power control in a dense network when the users interact with each other in a noncooperative manner. Taghizadeh et al. [12] proposed the mean field gradient ascent (MFGA) learning algorithm to solve the Nash equilibrium of mining game in a blockchain network and provided the sufficient condition for the convergence of the proposed MFGA algorithm. For more research works on applications of mean field game, refer to [13–17].
Infinitesimal gradient ascent (IGA) [18] is a common game learning algorithm, in which each agent updates its strategy in the direction of gradient of its expected reward and demonstrated that the IGA algorithm could converge to Nash equilibrium strategies or Nash equilibrium payoffs (when the strategies not converge) in self-play game. The main ideals of the gradient ascent algorithm are that each agent alters its strategy based on other agents’ forecast strategies rather than its current strategies. Later, Bowling and Veloso [19] presented IGA with win or learn fast algorithm (WoLF-IGA) by introducing the WoLF learning criterion into the IGA algorithm and proved that the algorithm converges to Nash equilibrium in all bimatrix games. Zinkevich [20] showed a generalized IGA algorithm, which extends IGA to games with more than two actions or two strategies, and is universally consistent. Banerjee and Peng [21] referred policy dynamics based win or learn fast (PDWoLF) for some bimatrix game and general sum stochastic game. Zhang et al. [22] studied the social-aware IGA (IGA-SA) by introducing social awareness into the strategy update process, if agents adopt rational learning strategies in the context of a repeated game and their strategies converge to the socially optimal outcomes of symmetric bimatrix games. In recent years, many scholars have designed different algorithms to analyze the power control game. Goodman and Mandayam [23] referred power control in wireless data transmissions network and presented an algorithm that has price function proportional to transmitter power in order to achieve each user individually maximize utility. Luo and Pang [24] proposed the convergence analysis of the iterative water-filling algorithm in more realistic channel settings and an arbitrary number of users for multiuser power control in digital subscriber lines. He et al. [25] presented a projection neural network to solve the Nash equilibrium of multiuser power control optimization problem in modern digital subscriber line. Tao et al. [26] showed a centralized power control algorithm based on projected gradient design for solving multiuser power control game in cognitive radio network. The power control algorithm based on noncooperative game theory is proposed [27]. Gulzar et al. [28] showed an adaptive power control algorithm to deal with the power control problem for cognitive radios networks.
Motivated and inspired by the work above, we consider strategic decision-making of multiuser power control in wireless network. We model multiuser power control problem as a -person noncooperative game. To achieve the Nash equilibrium of game, we propose a new mean field gradient ascent with the win or learn fast (WoLF-MFGA) algorithm by introducing the mean field method into gradient ascent with the win or learn fast algorithm. Furthermore, the use of variable gradient based on the decision-making behavior of users ensures that low-cost power control among users can be reached. This is new and provides an efficient approach for equilibrium analysis of power control games in wireless networks. The main contributions of this paper can be summarized as follows. (1)The multiuser power control problem is formulated as a -person noncooperative game, and we prove the existence of Nash equilibrium for the multiuser power control game by using some existing analytical tools, i.e., we obtain that there exists at least a mixed strategy Nash equilibrium for the game over the strategy space (2)The WoLF-MFGA algorithm is designed by incorporating mean field term into gradient ascent with the win or learn fast algorithm. Furthermore, we provide a sufficient condition of the convergence of the proposed WoLF-MFGA algorithm by using Banach fixed point theorem(3)In order to achieve the Nash equilibrium of the multiuser power control game, some simulation results show that the WoLF-MFGA algorithm can better approximate the best response. For the sake of analyzing the convergence of the proposed algorithm, we investigate the sensitivity results of the WoLF-MFGA algorithm
The rest of the paper is organized as follows. In Section 2, some related preliminaries and the model of multiuser power control game in the literature are reviewed. Furthermore, the multiuser power control problem is transferred as a -person noncooperative game, and the existence of Nash equilibrium for the game is given by means of a nonlinear analysis method. In Section 3, the WoLF-MFGA algorithm is designed in order to solve the Nash equilibrium of multiuser power control game. In Section 4, a sufficient condition for the convergence of the WoLF-MFGA algorithm is proven. In Section 5, some numerical simulations are used to support our theoretical results, and the sensitivity analyses of the WoLF-MFGA algorithm are used to ensure the convergence of the algorithm. In Section 6, we make some brief and concise conclusions.
2. Preliminary Knowledge
In this section, we introduce multiuser power control problem in wireless networks and formulate the multiuser power control problem as a -person noncooperative game. Multiuser power control game procedure is shown in Figure 1,

2.1. Problem Description
Throughout this paper, assume that all users are homogeneous and the unit price for wireless electric usage of each user is same, the main purpose of each user is to maximize their own total utilities. The chance of winning for the user in each period is equivalent to the proportion of its transmit power to the total wireless electric power capacities, where is the set of all users, denotes the number of users, is the probability distribution of successfully distributed power for the -th user, and is user ’s transmitting power over carrier terminal. Let us suppose that , where , and .
According to references [10, 12], we give a reasonable assumption as follows.
Assumption 1. Suppose that there exist at least two active users in the multiuser power control game, the -th user is called active if the user’s strategy .
It is clear that a higher signal-to-interference ratio (SIR) directly influences the probability of transmission errors, and a high ratio will generally increase the system throughput, i.e., lower bit-error rate. However, realizing a high SIR level needs the user terminal to transmit at a high power, which leads to more power battery life consumption [29]. Thus, the capacity of the -th user is defined as where and are the throughput and transmit power of the -th user, respectively. Besides, throughput is the rate of reception of correct data, and it can be expressed as where is the number of information bits, and is the total number of bits in a packet. and are the transmission rate and the SIR for the -th user, respectively. is the efficiency function expressing the packet success rate (PSR) and is assumed to be increasing, continuous, and -shaped with and .
By combining formulas (2) and (3), the expected utility of the -th user can be stated as follows: where denotes the set of the -th user in the same wireless network. Therefore, the payoff function of the -th user can be defined as where is unit price of wireless electric power usage for the -th user, for instance, electricity consumption price and storage charges. In a multiuser system, each user may be affected by other users’ states, and it is hard for each individual user to collect all other users’ state information in that limited perception ability. This leads us to consider establishing a uniform price calculated by the terminal to determine how much power to transmit, where the price keeps the same unit of measurement as the utility function. By equation (5), the user’s utility depends on all other users’ electric power requirement. Furthermore, multiuser power control problem can be formalized as a -person noncooperative game, in which terminals choose how much power to transmit through a wireless network to maximize each users’ utility. Hence, the competitive relationship between users can be modeled as a “multi-user power control game” based on the set of users, strategies, and payoff functions separately,
2.2. Multiuser Power Control Game
Suppose that denotes a multiuser power control game, where for any , denotes the pure strategy of the -th user, where is the number of available strategies for the -th user. represents the Cartesian product of pure strategy of all users, denoted by a pure strategy profile, is the set of mixed strategies of the -th user, where denotes the probability that the -th user selects the strategy , , denotes the Cartesian product of mixed strategy for all users, denoted by a mixed strategy profile, and is the expected payoff function of the -th user.
Next, we give the definition of Nash equilibrium for as follows.
Definition 2. (The Nash equilibrium of ). Let be a multiuser power control game, if a strategy is a Nash equilibrium, then for any and for any satisfying where . Then, is the Nash equilibrium of , which means that every user has no incentive to deviate from the equilibrium strategy since no user can unilaterally increase his own payoff if other users have not shifted their strategies.
Next, we present the existence of Nash equilibrium for multi-user power control game .
Theorem 3. There exists at least one mixed strategy Nash equilibrium for multiuser power control game over the strategy space .
Proof. Rosen [30] gave a sufficient condition for the existence of Nash equilibrium of every -person noncooperative game, i.e., the joint strategy space is a nonempty closed bounded convex subset in , and for each , the payoff function is continuous on , and is convex. Next, for the multiuser power control game, we verify all conditions of Rosen.
(1)Since is a Cartesian product of mixed strategy of each user, then it is a nonempty closed bounded convex subset in (2)Obviously, the payoff function is continuous on since the function is linear with respect to the variable from the formula (5)(3)By assumption 1, it can be seen that there are at least two active users in the multiuser power game, and then . Next, we derive the first-order derivative of payoff function with respect to is
Taking the second-order derivative of equation (8), we obtain
It is easy to see that , and thus the payoff function is convex in .
Hence, multiuser power control game satisfies the above sufficient conditions, and then there exists at least one mixed strategy Nash equilibrium for
In particular, the best response (BR) mapping of the -th user [31] is
where denotes the best response of the -th user to other users. In multiuser power control game, the best response mapping of the -th user is defined by solving in equation (5),
The equilibrium solution can be projected into the by using the constraints , and then we obtain the best response of the -th user as follows:
3. Mean Field Gradient Ascent with Win or Learn Fast Algorithm
In this section, the mean field gradient ascent with win or learn fast (WoLF-MFGA) is designed by introducing mean field term into gradient ascent with the win or learn fast algorithm.
3.1. Gradient Ascent
The gradient ascent (GA) algorithm was introduced by Singh et al. [18], which is applied in studying the dynamics of gradient ascent for two-person general-sum game. The payoff of a bimatrix game is defined by a pair of matrices,
Each player chooses a strategy from the set that determines the player’s rewards. If the row (or column) player chooses action (or ), then the row (or column) player receives the payoff (or ), respectively.
Let player 1 and player 2 adopt the mixed strategy that can randomly choose behaviors, respectively. Let represent the probability of the row player choosing action 1, and then the probability of the row player adopting action 2 is . Suppose that denotes the probability of the column player choosing action 1, then the probability of the column player adopting action 2 is . The expected payoffs of player 1 and player 2 are defined as follows: where
Now, we can consider the influence of users altering their strategy on their expected payoff. This can be obtained by solving the partial derivative of their expected payoff function corresponding to their strategies:
Aiming at the GA algorithm, the users correct their strategy after each iteration in order to increase their total utility, which means that the users will move their strategy in the current gradient direction with a certain step size . If denotes the strategy of the -th iteration, then the new strategy is updated by using the GA algorithm for all users as follows: where denotes some step size. In the GA algorithm, the user will move their strategy in the direction of the current gradient. However, the GA algorithm is not convergent, which is supported by [18]. Singh et al. examined the users’ dynamic in terms of an infinitesimal step size , which is called infinitesimal gradient ascent (IGA) algorithm.
Although the characteristics of rationality and convergence do not encompass all characteristics required in learning method, interestingly, it is very difficult to achieve two characteristics at the same time. This paper will introduce a new technique, a rational, and convergent algorithm. The main idea is to use a variable learning rate in a rational and convergent algorithm to make it better performance. Thus, we next introduce a new technique, and an algorithm is both simultaneously reasonable and convergent.
3.2. The WoLF-MFGA Algorithm
When faced with a large number of users, it is not practical or computable to solve the Nash equilibrium of multiuser power control game. At this point, the traditional analytical methods may face some problems with high computational complexity and even difficult to calculate, but mean field game theory helps analyze the behavior of a large number of users by encapsulating those users’ behavior into mean field terms. Besides, the mean field gradient ascent with the win or learn fast (WoLF-MFGA) algorithm is designed by introducing mean field term techniques into variable gradient ascent algorithm from the perspective of game theory.
The direction of iterative step of the gradient ascent algorithm is constant. In [19], it is proposed to allow the gradient direction to vary over time, and the updated rules are as follows: where , is the iteration step, and is a constant. At the -th iteration, the algorithm has a step size of in the gradient direction.
The essence of win or learn fast (WoLF principle) is to learn rapidly when you lose and carefully when you win. The intuition is that a learner should adapt quickly when the user’s performance is worse than expected value. When it performs better than expected value, they keep cautious because other users probably change their strategies. The main idea of the WoLF-MFGA algorithm is how to judge whether the user wins or loses, and this paper gives a criterion that the solution of the algorithm approximates as closely as possible the best response (BR) in the actual multiuser power control game. Meanwhile, each user will choose a Nash equilibrium and compare their expected payoff with the payoff in line with the chosen equilibrium strategy when they play the game. Please note that the user is not required to select the same equilibrium (i.e., the strategy may not be a Nash equilibrium), and the restriction of requires it to be strictly positive and bounded. Then, the update formula of is as follows:
Similarly, we consider in the case of an infinitely small step size , which is called infinitesimal WoLF-MFGA algorithm. Furthermore, we show that the effect of the WoLF-MFGA algorithm parameter adjustment on the convergence of the algorithm is very interesting. However, in order to calculate the derivative of the payoff function in equation (18), each user needs to know the strategies of other users, which are almost impossible when the number of users is sufficiently large. A feasible solution is the mean field game method, which means that each individual user does not observe the other individual users’ behavior but only the distribution of the user’s behavior.
Next, the mean field term value by considering the aggregative term is introduced: where is the mean field term of the -th iteration. The probability distribution of the -th users’ power allocation is as follows:
By considering equation (21), it becomes conscious of the payoff function of the -th user that is influenced by the strategies of all other users through a probability distribution. In multiuser power control game, an individual user’s strategy changes will not significantly change the payoff value of mean field game. Each user will estimate the electrical power of the entire wireless network at each iteration. Hence, by equation (5), the payoff function of the -th user with the mean field term can be expressed as follows:
By combining equation (22) into equation (18), we can obtain
As we know, the updated strategy of each user is attained as a function of mean field term. Thus, the update equation (20) of mean field term can be rewritten as the fixed point iteration as follows:
In addition, the update step formula of best response is as follows:
In conclusion, the WoLF-MFGA algorithm is designed by introducing mean field term into gradient ascent with the win or learn fast algorithm. The proposed algorithm implementation process is as shown in Algorithm 1, which means alternating update of multiuser power control and the strategies of users.
|
In multiuser power control game, each user encapsulates the influence of other users with mean field term and ignores their own influence on the mean field term; so, the mean field equilibrium is different from the best response (BR) equilibrium.
4. Convergence
In this section, the sufficient condition of convergence of the mean field gradient ascent with the win or learn fast (WoLF-MFGA) algorithm is considered.
Theorem 4. The convergence of Algorithm 1 with a sufficient condition is as follows: where .
Proof. On the basis of Banach contraction mapping principle [32], if the formula (24) converges, then is a contraction operator, which we equip with the Euclidean norm. Furthermore, for any , it satisfies the following inequality: where , , and can be calculated from equation (24). Then, there exists unique such that For equation (28), the left hand is equivalent to Since , ; then, equation (29) is transformed into In order to satisfy contraction principle, we need to choose a step size of such that Thus, In particular, the condition (31) could be abbreviated as This mean that the gradient ascent direction needs to maintain within an appropriate range to guarantee the convergence of the proposed WoLF-MFGA Algorithm 1.
5. Simulation Experiments
In the section, simulation experiments are used to compare the mean field gradient ascent with the win or learn fast (WoLF-MFGA) algorithm and the best response (BR) algorithm. The total utility is defined as the sum of all users’ utilities.
The best response of user are calculated according to formula (12). For the sake of simplicity, if not specifically mentioned, all simulation parameters are shown in Table 1. The parameter selection follows multiuser power control in wireless network as a case. denotes the utility for successful power transmission of the -th user. Taking into account the accessibility, ease of expenditure, and energy efficiency values of the electrical control system, it assumes that the utility is evenly distributed in the range of 70,000 to 90,000. denotes the unit price of wireless power. This paper takes the simulation results of 6 runs and describes the average results.
5.1. The WoLF-MFGA Algorithm vs. Best Response Algorithm
The total utility is defined as the sum of the payoffs of all users. In all experiments, and are small enough number to ensure the convergence of the proposed WoLF-MFGA algorithm. Since the number of users increases, the wireless network system turns into more competitive due to the competitive behavior of users, the surplus per user decreases, and the total utility of the wireless network decreases.
As shown in Figure 2, the total utilities of the WoLF-MFGA algorithm and best response algorithm are determined by the decision of all users. We set the variable learning rate parameters to 0.3, 0.5, and 0.8, respectively. Since best response learning is the theoretically optimal solution, it is highly likely to achieve a higher utility than the proposed WoLF-MFGA algorithm. When the is small, the WoLF-MFGA learning can better approximate the best response learning, and when the is large, it has little effect on the total utility of the WoLF-MFGA algorithm, but the best response fluctuates more, which means that the best response learning method is unstable. Nevertheless, the mean field solution is very close to the best response solution by considering incomplete information of the users from the rivals’ model. The main reason for the fluctuations is the randomness of the chosen parameters. It is easy to know that the total utility of the WoLF-MFGA algorithm is more stable regardless of the value of variable learning parameters, i.e., the WoLF-VGA algorithm is insensitive to the parameters and has better stability. In conclusion, the equilibrium solution of the WoLF-MFGA algorithm is close to the optimal solution of multiuser power control game with conflict of interest as possible. And then, as the number of users increases, the competitive behavior of users in the network becomes more intense, which results in total utility being reduced.

(a)

(b)

(c)
Figure 3 shows that the total utility for different ranges of is compared. The value of interval is [20000, 40000], [40000, 60000], [60000, 80000], [80000, 100000], [100000, 120000], and [120000, 140000], respectively. We assume that in each case, the user’s reward is uniformly distributed in the given range. And then, as increases, the -th user successfully transmits power and obtains the rewards, resulting in higher total utility of the entire wireless network. Since the relationship between the user’s payoff and its is linear, the total utility grows almost linearly with .

In the end, Figure 4 denotes the total utility of the wireless network for the WoLF-MFGA algorithm and best response algorithm for different ranges of . In each of these cases, is the uniform distribution within a given range. Owing to that corresponds to the unit price of wireless power control, it can be easily seen that the total utility decreases as increases. Meanwhile, the total utility expresses a reciprocal behavior relative to range.

5.2. The Sensitivity Analyses of the Proposed WoLF-MFGA Algorithm
To analyze the convergence of the proposed algorithm, we investigate sensitivity results on some parameters , , and . In all simulations, the range of iterations is chosen large enough to display the convergence of the proposed WoLF-MFGA algorithm.
By Figure 5, small changes in variable learning rate parameters can be tracked. We see that the smaller the value of the learning rate change, the convergence of the proposed WoLF-MFGA algorithm is slower. Nevertheless, a larger variable learning rate means that the fluctuation of the total utility value of the user is smaller, and the convergence speed of the WoLF-MFGA algorithm is faster.

Figure 6 shows that the different sizes of users in the WoLF-MFGA algorithm tend to a stable point in the wireless network as the number of iterations increases. The number of wireless network users is greater, and the total utility is higher, and conversely, the number of wireless network users is smaller, and the total utility is lower.

In Figure 7, we assume that , and . The total utility can be convergent after about 100 iterations. To analyze the behavior of the proposed WoLF-MFGA algorithm as the parameter changes, increases by 60% at 400 iterations. This sudden change in parameter values causes the system to move to a new equilibrium.

6. Conclusions
In this paper, we have studied the strategic decision-making of multiuser power control problem in a wireless network from the perspective of mean field game theory. The multiuser power control problem can be regarded as a -person noncooperative game, and we have proved the existence of Nash equilibrium for multiuser power control game. Furthermore, we have proposed a new mean field gradient ascent with the win or learn fast (WoLF-MFGA) algorithm by introducing mean field term into gradient ascent with the win or learn fast algorithm. In addition, we have given the sufficient condition of the convergence for the WoLF-MFGA algorithm, and simulation experiments have showed that the WoLF-MFGA algorithm becomes a better approximation of the accurate game by adjusting some appropriate parameters. Therefore, the WoLF-MFGA algorithm can better approximate the best response algorithm. In order to investigate the convergence of the proposed algorithm, we have given the sensitivity analyses of the proposed WoLF-MFGA algorithm. The analyzed approaches provided in this paper can be extended to other scenarios, such as sharing resource, multiagent cooperative, blockchains, and network economics.
Data Availability
The data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare no conflict of interest.
Authors’ Contributions
All authors read and approved the final manuscript.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant Nos. [12061020], [71961003]), the Science and Technology Foundation of Guizhou Province (Grant Nos. 20201Y284, 20205016, 2021088, 20215640), and the Foundation of Guizhou University (Grant Nos. [201405], [201811]). The authors acknowledge these supports.