Abstract
A mathematical model for -type queueing networks with multiple user applications and limited resources is established. The goal is to develop a dynamic distributed algorithm for this model, which supports all data traffic as efficiently as possible and makes optimally fair decisions about how to minimize the network performance cost. An online policy gradient optimization algorithm based on a single sample path is provided to avoid suffering from a “curse of dimensionality”. The asymptotic convergence properties of this algorithm are proved. Numerical examples provide valuable insights for bridging mathematical theory with engineering practice.
1. Introduction
In the past decades, great efforts have been devoted to model and optimize the network-based communication systems with the increasing transmitting demands and sophisticated performance criteria. However, technical challenges abound in designing such systems due to the limited network resources and the stochastic network characteristics. It is wellstudied that queueing theory is one of the primary tools used to deal with traffic engineering problems over both wired and wireless packet networks [1–3]. Factors affecting performance of network systems, based on the models in queueing theory, include the arrival rates (or the interarrival time distributions), the service rates (or the interservice time distributions), and the queue discipline. In this paper, we will concentrate on how to optimally and efficiently allocate the service rates to all concurrent user queues in each network element according to the arrival rates, so that the the lowest possible performance cost is achieved.
Suppose that the arrival rates parameter for each user is uncontrollable, and the interarrival time parameter is exponentially distributed. Let queue discipline be first-come first-served (FCFS). Based on this model, the decision parameter is the service time, namely, the service rates allocated to each user. Without loss of generality, consider that the service times of each user are independent and identically distributed with . Thus, the user queues are modeled as multiple concurrent queues, for which many techniques can be found in the literature [4–7]. More importantly, the optimal resource allocation problem is then translated into a resource-constrained Markov decision problem (MDP). The objective of the MDP is to find a resource allocation policy that minimizes the overall performance cost, by observing and analyzing the system behavior information. To achieve this goal in such a mathematically tractable MDP model, some solving results were proposed in [8, 9]. However, these methods may typically suffer from a “curse of dimensionality” [10]. In addition, if no structural information about the system is gained, we cannot explicitly compare the performance cost by observing and analyzing the network system behavior under different policies. Hence, a crucial question that comes to mind is how can we achieve our goal of performance cost minimization using as little computation effort and system structure information as possible? Thinking along this direction, we propose a sensitivity-based optimization algorithm tailored for the bandwidth-constrained and backlog-constrained queueing system.
Within the above issues addressed, we now confront a key question, namely, how to quantify the network system cost in terms of performance metrics. Note that the performance metrics may change not only as actual network scenarios vary but also as the influence factors change, that is, different network environments will lead to different definitions of performance metrics. For instance, a flexible cost function was proposed in [11], which takes into account the power, interference, backlog, and other factors in the ultrawide band (UWB) communication networks. In [12], a bandwidth related cost function was presented for wide area overlay networks. Another type of performance metrics, considering the waiting time and the energy consumption for serving jobs in an processor sharing (PS) queue, was provided in [13]. Therefore, throughout this paper, we use the word “cost” to refer to both backlog and service rates-related performance cost criteria in a broad sense.
In summary, the contributions of this paper are as follows. Firstly, we present a distributed resource-constrained -type queueing model for supporting multiple user traffic over communication networks. By translating the multiple queues of each relay network node into continuous-time semi-Markov processes, we further formulate the network system performance cost minimization as a stochastic optimization problem in the Markov system. Secondly, within the optimization framework, we explicitly define the network system performance cost measure, based on which an online policy scheme combining performance sensitivity analysis and MDP is proposed. This scheme is cost-benefit since during the data transmission process, the performance cost is significantly reduced by choosing the optimal policy while the computational complexity is greatly decreased in that it is based on a single sample path, that is, a trajectory of each user queue. Thirdly, with the estimation of the performance gradients, a resource allocation algorithm is developed. From the performance comparison formula, we directly obtain the optimality condition. Moreover, the asymptotic convergence properties of such algorithm are proved. Last but not least, the established model can be easily extended to a more general situation when the states of the -type queueing system are partially observable, that is, partially observable Markov decision problems (POMDPs).
The rest of paper is organized as follows. Section 2 starts by presenting the -type queueing system model. Next, Section 3 proposes the cost-benefit resource allocation optimization algorithm. To evaluate the performance of the proposed algorithm, numerical examples are provided in Section 4. Finally, the paper concludes with a short discussion in Section 5.
2. System Model
In this section, we model the queueing system according to the dynamic transmission procedure of each network user. With the formulation of user queues, firstly, we derive the steady-state Markov transition probability matrix. We then define the performance cost measure, based on which the objective function of the optimization algorithm is presented. Before digging into details, we summarize the used notations in Table 1.
Consider a network modeled by a topology graph, that is, , where denotes the set of network elements (nodes), and represents the set of links. Note that if a link , there exist a work-conserving server with time-invariant service capacity , which serves packets and transfers them from source element to end element of . More precisely, consider that each element keeps a separate queue for every user traffic going through it, which is illustrated in Figure 1. For the simplicity of exposition, let be the number of user queues served at any given time . Without loss of generality, consider that the capacity of each user queue is upper bounded by a constant , that is, queueing model with limited backlog capacity .

Denote the service time allocated to user queue as a general distribution for any given time . Since the users arrive at and depart from each network element randomly, we should allocate the service rates dynamically for all so that the performance cost at the element is minimized.
Then, we have where represents the mean service rates of user queue at time , satisfying , is the long-term average rate of the th user arrivals, namely, the intensity of the Poisson arrival process. For ease of presentation, hereafter, and will be used interchangeably. Let be the feasible region of the service rates allocated to user queue . To be more precise, we introduce some definitions to accurately modeling the th user queue.
Definition 2.1 (semi-Markov queueing process). A semi-Markov process characterizes the th user queue's behavior on the state space , where represents the number of packets in the queue after the latest packet left a network element at time .
Remark 2.2. The semi-Markov kernel of can be further represented as where
Let be the embedded Markov chain of , where is interpreted as the number of packets in the th user queue when the th packet has been served. It is essential to note that is positive recurrent, irreducible, and aperiodic under the condition of since is. Moreover, has both the same steady-state probability vector and the same steady-state performance cost measure (discussed later) as . According to [3], has standard transfer probabilities , and for any with respect to converges consistently to a constant for as . In this state, the transition probability matrix of can be derived as follows where
Remark 2.3. Note that the symbol represents the probability of packets arriving at the time interval when a packet of the th user queue is being served. The balance equation of each concurrent user queue can be further expressed as where is a -dimensional column vector whose all components are 1's, and the superscript “” denotes transpose.
In principle, the performance cost measure is based on the definition of performance cost function. Note that performance cost is a commonly used term that changes its meaning with different network environments. Considering there is limited backlog space in each network element, therefore, the more backlog is occupied, the higher cost is paid. In this state, with the increasing of service rates, the backlog-related cost can be reduced accordingly. However, in many practical networks, especially considering the various wireless environments, transmission cost cannot be neglected. In general, the transmission power is considered as a convex increasing function with respect to the service rates. Thus, the design of performance cost function should trade off both the backlog related and the service rates related costs. Conceptually, we associate to each user queue a performance cost function defined as follows.
Definition 2.4 (performance cost function). Consider a general performance cost function associated to the th user queue, which is the sum of the backlog-related cost and the service rates-related cost , that is,
Suppose that the performance cost function is differentiable with the service rates on . For ease of notation, hereafter, the terms and are used interchangeably throughout the paper. Now, it is imperative to define the performance cost measure as our objective function for each user queue. Motivated by [3, 14], the definition is as follows.
Definition 2.5 (performance cost measure). The performance cost measure with respect to the service rates for each user queue is denoted as where is a -dimensional column vector, and denotes the expectation with respect to the steady-state probability of the semi-Markov process in Definition 2.1.
Since the state space of is finite, we should note that for each nonnegative bounded performance cost function, there is
Remark 2.6. Each user queue has been modeled as a semi-Markov queueing process , based on which the transition probability matrix has been derived. Suppose that it is differentiable with service rates . Denote as an infinitesimal generator of under the service rates , where and for are elements of and satisfy and for , . Thus, the infinitesimal generator is differentiable with respect to . Note that the elements of represent the transition rate of the packet number in each user queue. More importantly, the cost-benefit optimization algorithm can be further developed for all -type user queues in Section 3 by changing the service rates allocated so that the corresponding infinitesimal generator of each user queue is modified.
3. Resource Allocation Algorithm
In this section, we take a fresh look at the problem of resource allocation from the perspective of system performance cost and explore a cost-benefit gradient algorithm that minimizes the performance cost for all concurrent -type queues, subject to the bandwidth constraint.
3.1. Problem Formulation and Optimality Criterion
Since we focus on the stochastic dynamic queueing system, the estimation of its statistical properties is essential. In addition, such estimation needs not only to be accurate but more importantly, to be efficient when taking into consideration the delay sensitiveness of the real-time network applications. Consider that the main tenet of perturbation analysis (PA) is that a great deal of information is contained in the sample paths of a dynamic system, beyond the usual statistics collected such as the means and variances of various variables [15]. Thus, in essence, we can estimate the performance gradient with respect to the service rates and further minimize the user queue's performance cost measure based on a single sample path with PA.
In particular, several PA approaches have been introduced in solving network problems (see, e.g., [16, 17]). However, a general approach that supports a wide range of stochastic optimization problems awaits to be proposed. A new approach was proposed in [18] to analyze a number of Markov systems based on a single sample path. Moreover, the optimization formulations for Markov [14, 19, 20], semi-Markov [21], and partially observable Markov [22] systems were proposed, and in [3], the theory has successfully been extended to evaluate the queueing systems. The structure of PA-based queueing system is shown in Figure 2. For each feasible resource allocation policy, a set of service rates are allocated to each user queue. With each change of one user queue's service rate, a perturbation is generated on the queue's sample path, which has effect on the system performance cost.

As illustrated in Figure 3, in an -type user queue , such a perturbation can be regarded as a “jump” among its states and has effect on the performance cost . Thus, we need to measure all states' effect on the performance cost before discussing the performance cost optimization. We briefly introduce a concept called performance potential that is useful in this paper [18].

Definition 3.1 (performance potential). Denote as the th user queue's performance potential of state under service rates with respect to the performance cost function . It measures the effect of state to the performance cost and can be written as
Remark 3.2. The performance potential vector of the user queue with respect to the performance cost function is denoted as . In essence, the performance potential vector is the solution of the Poisson equation, which has been studied remarkably in the literature [23]: Furthermore, if is strongly ergodic, all of the performance potential vectors can be calculated by , where is said to be the group inverse (for details, see [18]) of the 's infinitesimal generator under service rates .
Now, we need to assign the feasible service rates region to each active user queue. It is well-studied from queueing theory that an -type user queue is called stable if , that is, . Moreover, the network element is stable if and only if all individual user queues are stable. Suppose that the lower bound service rates of each user queue at epoch are set to , and . Thus, we denote the leftover service capacity as . In this state, we further assign the th user queue a parameter called sharing weight Apparently, we have for any given . Thus, at epoch , the service rates of user queue are upper bounded by, Therefore, the service rates policy space in Figure 2 is defined as follows.
Definition 3.3 (feasible policy space). The policy space for all user queues at epoch can be denoted as a compact set , where , and .
Next, we analyze the optimal criterion for the performance cost optimization.
Theorem 3.4 (optimality criterion). A cost-benefit resource allocation policy is optimal with each given initial policy if and only if for each user queue , one has
Proof. Note that is a compact set, and is component-wise continuous on . Thus, there must exist at least one cost optimal service rates in .
According to [20], we have the fact that the service rates for th user queue is cost optimal if and only if
where the symbol denotes vector inequality or component-wise inequality in , and is the the states' number of each user queue.
Note that a better cost-benefit service rates can be searched based on the comparison of current service rate. Besides, from (3.2), we have .
Thus, we can conclude that if the is the cost optimal service rates, the following equation:
is established and vice versa.
3.2. Gradient-Based Policy Optimization
Now, the purpose is to develop an efficient and practical policy algorithm, which minimizes all the user queues' performance cost based on the corresponding sample paths. In essence, the objective function for each user queue in Definition 2.5 represents the time-average performance measure of the queue. Thus, developing a global optimization algorithm for such a performance measure will greatly increase the complexity. More importantly, to some extent, it is impractical to consider both the delay sensitiveness of user applications and the dynamic changes of the network element. In this state, a fast gradient-based optimization for the stochastic system is considered. It is well studied that a gradient optimization algorithm is to find a local minimum of objective function; however, it can be fairly efficient, especially when the interval for each user queue is not very large. To begin with, the performance gradient formula for each user queue is derived as follows.
Theorem 3.5 (performance gradient). For any given resource allocation policy at each event time , the gradient of the performance cost generated by the th user queue is obtained by
Proof. By taking the gradient of the performance measure (2.8), we have From [14, 20], there must be particular solution to (3.2), such that ; thus, we obtain With , we further have Left-multiply by on both sides of (3.11), it follows that Recall that , then we have Hence, using (3.12), it suffices to show that Combining (3.9) with (3.14), the result then follows.
Then, we proceed to describe the process flow of the policy gradient algorithm shown in Figure 4. The procedure of the algorithm is described in Algorithm 1. Note that in Algorithm 1, is defined as the span seminorm on .
|

Moreover, the construction of the algorithm is presented as follows. The algorithm begins by choosing an arbitrary feasible policy for all user queues at given time . Then, with current service rates, the corresponding performance gradient is calculated by analyzing the sample path of each user queue. Based on the line search along the gradient, the right step size is obtained. Thus, a better policy can be updated for each user queue. By iteration until the stopping criterion is met, the optimal cost-benefit resource allocation policy can finally be achieved.
Without loss of generality, suppose that every gradient iteration of each user queue leads to an improving performance cost, that is, where represents the iteration index in Algorithm 1. The convergence property of the policy gradient algorithm is evaluated as follows.
Theorem 3.6 (convergence property). Consider is a performance improving resource allocation policy sequence at each given time , then ; one has:(a)(b)That there must exist an optimal cost-benefit resource allocation policy, denoted as , such that
Proof. For part (a) Since is a performance improving resource allocation policy sequence, we can conclude that for each user queue , is a monotonously decreasing and bounded performance cost sequence, with lower bound .
By using the continuity of , we obtain a service rates for each user queue , satisfying and .
Thus, it follows that
which is equivalent to
Recall that
by taking limit as , we further have
For part (b) Note that for , such that ,holds when .
To show the second relation of the theorem, considering ; it follows that
Since in Algorithm 1, is the best choice for along the gradient by the line search, we therefore have
Subtracting (3.23) by (3.24), for , it follows that
Thus, it can be regarded that is an -optimal service rates, and we can write by the arbitrariness of .
Finally, from the fundamental fact of the uniqueness principle of limitation theory, we can conclude that , which is equivalent to say that is an optimal cost-benefit service rates. This leads immediately to the results that and .
Remark 3.7. By proving the convergence property of the policy gradient algorithm, the cost-benefit resource allocation optimization approach for the user queues has been proposed. In a broad sense, the service time allocated to each user queue has been considered as a general distribution . However, the mathematical expression of performance gradient is not unique according to different application scenarios.
3.3. Performance Gradient Analysis of Application Scenarios
To make the analysis more tractable, we now present two application scenarios for which the performance gradient can be explicitly derived.
Deterministic inter-service time.
This is the simplest practical case where inter-service times for each user queue are deterministic, that is, . Without loss of generality, we denote each inter-service time as a constant . Consider now that . Let be the transfer matrix of the embedded Markov chain. In this scenario, hence, we have that for all , the element of the transfer matrix is equal to,
implying that the element of steady-state probability vector satisfies
According to (3.26), hence, we have
Corollary 3.8. Considering the th user queue in the steady state, from Theorem 3.5, the performance gradient with respect to the service rates for the case of is where ,, and .
Exponential inter-service time.
Having considered the straightforward scenario of deterministic service rates, we will now investigate the case where the inter-service times are independent and exponentially distributed, that is, memoryless. For ease of notation, we show results using the same parameters as in the first scenario. Similarly, in this case, we derive the element of the transfer matrix as follows:
it suffices to show that,
Corollary 3.9. Considering the th user queue in the steadystate, from Theorem 3.5, the performance gradient with respect to the service rates for the case of is where
Remark 3.10. Note that the essential feature behind the cost-benefit resource allocation is the performance gradient corresponding to the service rates. Based on this, the policy gradient algorithm for each user queue is executed, until the performance cost of network system is minimized.
4. Performance Evaluation
In this section, we investigate the performance of the -type queueing system. Before proceeding, we first present the simulation model, namely, the sample path-based simulation scheme.
4.1. Simulation Model
To evaluate the performance of Algorithm 1 for each user queue , it is imperative to calculate both the steady-state probability vector and the performance potential with any feasible service rates in every iteration
According to the Borel property [24], the steady-state probability of state in the embedded Markov chain has an unbiased estimate as follows: where and denotes the transfer number of the queue states and is set to 10000 in this simulation. Once the steady-state probability vector has been estimated, the estimation of the performance cost for the user queue can be derived by .
However, solving the Poisson equation in (3.2) for achieving leads to significant computational complexity, which is impractical for the online cost-benefit optimization. Therefore, a sample path-based estimation is very essential. Note that the performance potential can be estimated as where is called the realization matrix, and the first passage time from state to state , is expressed as .
Thus by Theorem 3.5, the gradient corresponding to the service rates allocated in the th iteration for the user queue can be efficiently estimated by
4.2. Numerical Results
The following describes numerical examples to illustrate the analytical results derived in the previous sections. Consider a given time and that there are four active user queues in the network element. Here, we limit our experimental tests to the simulation parameters values that are depicted in Table 2. Note that the value of , , and can be considered as packet numbers transmitted per unit time.
Besides, the backlog capacity of every user queue is set to 100 packets. The performance cost function of the sharing user queues is considered as , where are constants. In this experiment, they are set to 80 and 1, respectively. Moreover, the stopping criterion in Algorithm 1 is set to 0.001, and the link service capacity is set to 200 packets per unit time. By choosing the initial resource allocation policy as the minimum service rates for all user queues, the simulation results are described in Table 3. Finally, the iteration processes of the four user queues are shown in Figure 5.

(a)

(b)

(c)

(d)
Based on the observation on Figure 5, we can conclude the following. (i)Given the the stopping criterion and each feasible region , all iterations for user queues can converge within 15 steps. (ii)During the algorithm iterative process, the corresponding performance cost has been reduced by , and , respectively for each user queue. (iii)The optimal service rates for each user queue (e.g., ) may not be achieved on the boundary (i.e., or ) of the feasible region shown in Table 2.
Remark 4.1. Note that the convergence rate of the proposed algorithm is closely related with the estimation of theperformance gradients. The convergence rate of the algorithm will grow faster with a more accurate estimation. More precisely, according to Theorem 3.5, the performance gradient can be calculated via the steady-state probability vector and the performance potential in every iteration . Thus, we can increase the number of queue states' transfers, thereby achieving the estimation accuracy of steady-state probability vector and performance potential, or equivalently, performance gradient. In addition, the reason for the location of the optimal service rates is that there exists a trade-off between the backlog-related and service rate-related performance cost, which can be adjusted by the definition of the performance cost function.
5. Conclusions
In this paper, performance optimization problems of communication networks with stochastic characteristics are studied. To describe this complex dynamic process of system behavior, all user queues in each network element are represented by multiple concurrent -type Markov processes such that system model is proposed. Furthermore, an efficient algorithm is developed for the optimization of system performance cost by using sensitivity analysis approach. During every iteration, the proposed algorithm estimates the derivative of the performance measure and the performance potential by analyzing a single sample path of each user queue, which implies its computational efficiency. The asymptotical convergence analysis, combined with the numerical examples, paves the way for designing cost-aware computer communications systems.
Acknowledgments
This work was supported by National Natural Science Foundation of China under Grant nos. 60904021 and 60774038 and National High-Tech Research and Development Program of China (863 Program) under Grant no. 2008AA01A317.