Abstract
Because memory buffers become larger and cheaper, they have been put into network devices to reduce the number of loss packets and improve network performance. However, the consequences of large buffers are long queues at network bottlenecks and throughput saturation, which has been recently noticed in research community as bufferbloat phenomenon. To address such issues, in this article, we design a forward-backward optimal control queue algorithm based on an indirect approach with parametric optimization. The cost function which we want to minimize represents a trade-off between queue length and packet loss rate performance. Through the integration of an indirect approach with parametric optimization, our proposal has advantages of scalability and accuracy compared to direct approaches, while still maintaining good throughput and shorter queue length than several existing queue management algorithms. All numerical analysis, simulation in ns-2, and experiment results are provided to solidify the efficiency of our proposal. In detailed comparisons to other conventional algorithms, the proposed procedure can run much faster than direct collocation methods while maintaining a desired short queue (≈40 packets in simulation and (ms) in experiment test).
1. Introduction
Nowadays, modern computer networks are incredibly complex and we rely on them to transport huge quantities of data across the globe in seconds. Although this works well, there are some foreseen issues that need to be tackled. As bandwidth-heavy applications such as peer-to-peer networks and websites relying on user-generated content have become more prevalent, especially the relatively slow residential broadband links have been used at full capacity, and interruptions in connectivity have become more common [1]. Recent research has shown that the culprit is the buffers built into network equipment [2]. Accordingly, bufferbloat term has been used to describe related issues whenever these buffers misbehave to produce unnecessary latency [3].
In order to efficiently manage queues which are generated due to the bufferbloat phenomenon, active queue management (AQM) algorithms have been recommended to use in network equipment [4]. Most of the existing approaches to AQM design exploit feedback control theory with the linearized TCP core model that was proposed by Hollot et al. [5]. The well-known AQMs that monitor the average queue size and drops (or marks) packets based on statistical probabilities are Random Early Detection (RED) [6] or Random Early Marking (REM) [7]. If the buffer is empty, all incoming packets are accepted. When the buffer is full, the probability has reached and all incoming packets are dropped. When queue is growing, the probability grows according to a piecewise linear function and RED (or REM) drops (or marks) packets using the updated probability. The main drawback of RED or REM is that it is sensitive to network parameter changes and requires careful tuning of its parameters in order to provide optimal performance in any scenarios. Recently, in [8], a Proportional Integral Enhanced (PIE) controller as a lightweight AQM is proposed, without the need of per-packet extra processing. Such PI-type controllers are known to provide queue control with zero offset (the mean queue length converges to the target value) but are consequently less stable and slower reacting. There have been several other control-theoretic solutions based on feedback fluid-flow model proposed in [9–12] to improve stable and robust control mechanisms, but none of them tackle the issue of system optimality in terms of minimizing both queue length and packet dropping rate performance and searching for an optimal control trajectory.
To address such issues, a promising design direction is to reformulate the network queuing problem as an optimal control queue (OCQ) problem [13], where the main state variable is queue length and the control variable is the actual input rate to form the queue [14] or packet dropping rate. Then one of the approaches to solve OCQ is that we can priorly discretize the governing ordinary differential equations (ODEs) and the integral terms in the cost function or constraint functions and thereby replace the infinite dimensional optimal control problem with a large nonlinear optimization problem (NOP). This is known as the direct method in the literature for solving OCQ. This approach is typically easier to use, especially for OCQ with a state equality or inequality constraints. The main difference among direct approaches is how to handle the constraints corresponding to the system dynamics. The three most common direct approaches are direct single shooting, direct multiple shooting, and direct collocation. Direct methods have been used with references in [15–18].
Alternatively, one can first form the optimality conditions, using the calculus of variations and Pontryagin minimum principle, and then solve the resulting boundary value problem. This is known as the indirect method for solving OCQ. The references present just a small sample of the work that discusses or applies indirect methods for the solution of optimal control problems [15, 16, 18, 19]. In rare cases, the solution can be obtained in closed form from the optimality conditions, but, in general, approximation methods are used to solve the problem numerically. The optimality conditions of these problems generally take the form of differential algebraic equations (DAEs) with boundary conditions (BCs). The approximate solution to the OCQ can be obtained by using a boundary value problem (BVP) solver. Perhaps the most popular methods are multiple shooting and collocation. More recently, a combination of direct and indirect methods was proposed leading to hybrid methods [16].
In this paper, we firstly derive methods to solve OCQ using both direct and indirect approaches. We show how to apply the direct collocation approach to solve OCQ problem which can be solved in popular optimization solvers such as JModelica [20] and GAlib [21]. Indirect method with forward-backward for optimal control queue algorithm (FB-OCQ) is designed as an alternative to tackle it as well. Our key difference from existing works is that we provide a novel method to update the control step by solving a parametric optimization subproblem. This method is scalable which means that we can expand for larger problems with more variables as well. The numerical results for both direct and indirect approaches are discussed in Section 5.1 to emphasize our choice of integrating an indirect method with parametric optimization for active queue management design, which is demonstrated more efficiently (i.e., faster reaction and more stable) than direct methods. Finally, we evaluate the proposed algorithm using network simulator ns-2 and compare it to other AQMs including RED, REM, and PI. The dropping feature of the proposed algorithm makes the average queue length shortened and stabilized compared with others, while throughput is not reduced so much.
Our Contributions (i)We present both indirect and direct methods for an optimal control problem which is applied in queue management field (Section 3). The step update of control variable is calculated by solving a subproblem of parametric optimization with the advantage of scalability. Numerical results show that all of them bring nearly similar results, but indirect method (FB-OCQ) is much faster than other solvers and gives the best cost function value (Section 5.1).(ii)We evaluate FB-OCQ in simulation and show that a desired small queue length value at packets can be obtained. Nevertheless, we cannot avoid the trade-off between queue length and throughput. FB-OCQ’s throughput is slightly smaller than RED’s one ( versus Mbps); however, it can be an acceptable value when compared to REM’s and PI’s (Section 5.2).(iii)We implement FB-OCQ in Linux kernel (Ubuntu 16.04) and test it in the worst case using Realtime Response Under Load (RRUL) test suite. The experiment result shows that our algorithm brings low latency ping value compared with the other existing algorithms in Linux kernel (DropTail and RED).
2. Problem Formulation
We consider an optimal control model of queue management problem, named , in [14] (Figure 1). The main idea is to minimize the cost function which implies a trade-off between queue length and dropping rate, subject to a dynamic constraint of queue length along time to as follows:where is cost function; ; is weight on dropping rate; is final time; is service rate (bandwidth capacity); and is parameter for different types of queuing model; for example, when , we obtain an M/M/1 queue.

3. Numerical Methods for OCQ
3.1. Direct Method: Collocation Method
In this section, we present how to apply the direct collocation [18] for OCQ. In this method, we discretize the time interval into elements. The state and the total number of packets and control variables at each node are and and , such that the state, control, and packets variables at the nodes are defined as nonlinear programming (NLP) variables:The controls are chosen as piecewise linear interpolating functions between and for as follows:The value of the control variables at the center is given by
The piecewise linear interpolation is used to prepare for the possibility of discontinuous solutions in control. Similarly, we can derive the approximate of the total number of the packets and as above. The state variable is approximated by a continuously differentiable and piecewise Hermite-Simpson cubic polynomial between and on the interval of length :whereThe value of the state variables at the center point of the cubic approximation isand the derivative isIn addition, the chosen interpolating polynomial for the state and control variables must satisfy the midpoint conditions for the differential equations as follows:Equations can now be defined as a discretized problem as follows: where , , and are the approximations of the state, the control, and the total number of packets, constituting in (9). This above discretization problem (9)-(10) can be solved using the following:(i)JModelica, which is a package for simulation and optimization of Modelica models (for more details see [20]);(ii)GAlib, which is C++ library of genetic algorithm (for more details see [21]).
3.2. Indirect Method: Forward-Backward Sweeping
We solve OCQ by using indirect method approach:(i)forming optimality conditions;(ii)solving BVP by First-Order Sweeping algorithm.Let be the adjoint variable. At time , let denote the optimum controls and let and denote the state and adjoint evaluated at the optimum. Using Pontryagin minimum principle we get the following equations.
Hamiltonian Function
Adjoint Equations
Transversality Condition
Hamiltonian Minimization Condition. Derivative of the Hamiltonian is evaluated to zero at interior points; henceIf is optimal in , then it must satisfy the minimum condition:
Furthermore, we assume the following:(H1)the Hamiltonian is strictly convex with respect to control variable ;(H2) is continuous on ;(H3)
Note that is determined in a unique way for each since is convex and is convex. Now we consider the problem of minimizing the Hamiltonian with respect to :Usually in the literature, is found explicitly as a function and, after substituting it into the system, the problem reduces to the boundary value problem.
Theorem 1. Assume that conditions (H1)–(H3) hold and problem has an optimal solution . Then for a given there exists a finite discretizationand an approximate solution , such that
Proof. Let be an optimal solution of problem. Then satisfies the conditions: whereand satisfies the minimum principle:
(a) First-Order Sweeping Method. We linearize the Hamiltonian around a reference solution and obtain the following equation for variation :
With the strong Legendre-Clebsch condition, one can approximate the above equation as whereis the augmented Hamiltonian function and is a penalty term. This is really parametric optimization, with initially; if is a convex set and is a convex function with respect to , then is a descent direction. If does not yield a reduction, then we set and then double it repeatedly until the objective is really reduced.
Remark 2. An alternative way to ensure descent is to apply Backtracking Line-Search Procedure.
(b) Parametric Optimization to OCQ. Now we consider problem (23) as one parametric minimization problem. Since is twice differentiable in and assumptions (H1)–(H3) hold, we can apply Theorem 1 to the problem. Then as a result, it generates a discretization, , and corresponding points such thatwhich proves the assertion.
Remark 3. Parametric optimization also can be applied in finding nominal optimal control given in [22]. It is easy to see that, at each iteration , the Hamiltonian function is a scalar function of and ; that is,The latter states that must be a minimizer of the following problem:which is a problem of parametric optimization as formulated in various papers from [23], where the independent variable is now considered as unknown parameter We can also consider a case when the set of admissible controls is time-varying; that is, . In this case, a general theory of parametric optimization is also applicable for finding the nominal optimal controls.
Let be an optimal process in problem. Introduce the function :where a parametric optimization problem is defined as
The KKT conditions for the problem state thatwhere
Consider the auxiliary parametric optimization problem:
Let satisfy the KKT conditions for problem with This system can be written in the following compact notation:where In order to apply Newton’s method to system, we have to solve a linear system with as matrix. The same matrix is used to compute :
Therefore, using the Newton method as corrector, we have
4. Proposed Algorithm: FB-OCQ
The first-order indirect approach motivates us to design FB-OCQ algorithm. Shortly, this algorithm uses gradient and does forward-backward searching for the optimal control solution. Let us choose an initial control trajectory: , ; ; , , , , TOL, MaxIter, .From Algorithm 1 (FB-OCQ), it is obvious that the algorithm terminates if the norm of the gradient of the Hamiltonian with respect to , , during the run time of the program, is smaller than the tolerance and the parameter must change at the inner iterations if we do not have a descent direction, so we must divide by parameter until we get a reduction in the cost function. Figure 2 explains our algorithm steps: forward sweep (control variable and cost function value) and backward sweep (adjoint variable) in detail.
|

5. Performance Evaluation
5.1. Numerical Results
In this section, we provide some iteration results for both direct and indirect approaches as follows.
Indirect Method: FB-OCQ. The ODE was solved on equidistant discretization with discretization points. The optimal control and optimal state are depicted in Figure 3. We realize, by looking at the final control, that there exist some points where the set of active constraints changed due to the singularity; that is, the optimal control is of the bang-bang type with the possibility of a singular arc. To the meaning of such singular arc for queue management, it presents the sudden changes of input rate from (packets/sec) to (packets/sec) which accordingly result in the changes of buffer load (or queue length). In the history, in Figure 3, the parameter changed at the inner iterations because we do not have a descent direction, so we must divide by until we get a reduction in the cost function . The processing time of central processing unit (CPU) for this algorithm is s and the norm of the gradient of the Hamiltonian function with respect to the control goes to approximately .

Direct Collocation Method: JModelica and GAlib Solvers. With and the number of interpolation points being , we tested the discretized problem (9)-(10) by JModelica and GAlib solvers (see [20, 21]) as follows.
(i) JModelica Solver. The cost function is reduced to and the CPU processing time is s. The optimal control and optimal state trajectories are obtained during JModelica running and illustrated in Figure 4. An advantage of this approach is that we do not need to derive the adjoint equations.

(ii) GAlib Solver. For this problem, we use the number of generations , and the population size is . The optimal control and optimal state are depicted in Figure 5 obtained during the run of the GAlib. The cost function is reduced to during the run time of the program. The CPU processing time is s. Although the genetic algorithm has an advantage that it needs no derivatives or Hessian’s information, the control functions produced by it are useless and the convergence is very slow in comparison to Algorithm 1 (FB-OCQ) and the one using JModelica solver.

Table 1 summarizes and compares the three methods that we develop numerically. We conclude that Algorithm 1 is much faster than the other solvers and gives the best cost function. By using genetic library GAlib, the cost function does not reach the local solution as it claims in finding the global solution. The limitation of the indirect methods (FB-OCQ) is that one should derive the adjoint equations which are not easy to derive in some applications.
5.2. Simulation Results
In this section, the performance of the obtained algorithm (FB-OCQ) is evaluated by comparing it with some popular AQMs including RED, REM, and PI. The credibility of results is confirmed using ns-2 simulator [24]. We investigate a network topology with sources, an intermediate router, and one destination (Figure 6). All sources simultaneously send packets to the destination through the router. Hence, a large queue is built up at the router or bottleneck point. Maximum buffer size of the router is packets. All of the compared RED, REM, and PI algorithms are configured to obtain the desired queue length at packets. Simulation lasts for seconds and is repeated using a built-in random generator in ns-2 to obtain more credible results.

Figure 7 shows dropping probability of FB-OCQ and RED algorithms. With the proposed square-root drop function, FB-OCQ drops queuing packets more aggressively than RED although dropping frequency is nearly the same. Maximum dropping ratio is for FB-OCQ and for RED. In fact, when the algorithm drops more packets, the queue stability will be increased but we will have to sacrifice the system throughput performance. We can see this trade-off in next results.

Figure 8 presents average queue length values measured at the bottleneck link from router to destination for different algorithms. Due to aggressive dropping, FB-OCQ maintains the shortest queue at packets which is also the desired value. Only REM can obtain the same value but in a longer time, ≈50 seconds. PI, in fact, can achieve the same performance only if its parameters are well configured and be dynamically changed to different network scenarios. In Figure 8, we can see that PI performs not so well, even we set the desired queue length variable at packets and exploit default parameters in ns-2 for PI controller.

Finally, we investigate throughput performance of proposed FB-OCQ algorithm. It is confirmed from Figure 9 that there exists a trade-off in the relationship between throughput and queue length/dropping probability. PI and REM achieve nearly the same throughput at (Mbps) while FB-OCQ and RED obtain the better throughput performance at (Mbps) value. In Figure 8, although REM can achieve the desired queue length packets in this scenario, REM still maintains a large queue during simulation time from (sec) up to (sec). That reason leads to throughput performance of REM being lower than RED and FB-OCQ. Our proposed FB-OCQ drops aggressively the packets so that its throughput value (Mbps) is slightly less than RED (Mbps).

5.3. Experiment Results
We examine the effectiveness of our state-of-the-art optimal control queue method in the Linux kernel (e.g., Ubuntu 16.04). We conduct experiment from a desktop computer through the Internet gateway to an outside server (Figure 10). To manage working queuing disciplines (qdiscs), we use the scheduler qdisc in the Linux kernel. The chosen server is a dedicated bufferbloat server, which is able to stand very high congestion due to many data flows at the same time. We exploit the Flent: FLExible Network Tester [25] and Realtime Response Under Load (RRUL) test [26] to evaluate our proposal. RRUL test puts a network under worst case conditions, reliably saturates the network link, and thus recreates bufferbloat phenomenon for queue algorithm testing. Simulation time duration is seconds.

We compare latency under load test with queue being handled by different AQM schemes (DropTail, RED, and FB-OCQ) in turn. DropTail queuing method is by far the simplest approach to network router queue management. The router accepts and forwards all the packets that arrive as long as its buffer space is available for the next incoming packets. If a packet arrives and the queue is currently full, the incoming packet will be dropped. The sender then detects the packet lost event and shrinks its sending window. While it is the most widely used due to simplicity and relatively high efficiency, DropTail has some weakness such as the bad fairness sharing among TCP connections, and throughput and bottleneck link efficiency suffer severe degradation if congestion is getting worse.
RED [5, 6] was presented with the objective of minimizing packet loss and queuing delay. Moreover, it can compensate the weakness of DropTail by avoiding global synchronization of TCP sources so that it improves fairness. To achieve these goals, RED utilizes two thresholds, and , and an exponentially weighted moving average (EWMA) formula to estimate average queue length [27]. When the average queue length exceeds a predefined threshold, the link is implied to be in congested state and drop action is taken. A temporary increase in the queue length notifies the transient congestion, while an increase in the average queue length reflects long-lived congestion. Based on such information, RED router sends randomized feedback signals to the senders to make decision of decreasing their congestion windows. RED has good fairness among connections because of the feedback randomized mechanism [28].
Figure 11 presents latency ping results under RRUL test suite. Ping is a networking utility and operates by sending Internet Control Message Protocol (ICMP) echo request packets to the target server and waits for an ICMP echo reply. The program measures the round-trip time from transmission to reception, reporting errors and packet loss. Our proposed algorithm FB-OCQ achieves the lowest packet latency compared with the other two algorithms inside Linux kernel. Specifically, the packet latency when using FB-OCQ is about (ms), while about (ms) if using DropTail (pfifo_fast in Ubuntu) and ms if using RED algorithm.

6. Conclusions
We proposed a queue management algorithm named forward-backward optimal control queue (FB-OCQ) to solve the OCQ problem. Derived from indirect approaches in dynamic optimization, this algorithm demonstrates faster reaction while still achieving the same performance in numerical analysis compared to direct methods. Employing under network simulation ns-2, we see that the proposed algorithm drops packets more aggressively than the traditional RED algorithm, in a higher frequency and magnitude. As a result, average queue length can be reduced much more, while an acceptable value of throughput still can be maintained. In future works, we try to investigate the memory efficiency of FB-OCQ under wireless sensor networks.
Competing Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MEST) (no. NRF-2016R1A2B1013733).