Abstract

Cloud computing is a new computing paradigm to deliver computing resources as services over the Internet. Under such a paradigm, cloud users can rent computing resources from cloud providers to provide their services. The goal of cloud users is to minimize the resource rental cost while meeting the service requirements. In reality, cloud providers often offer multiple pricing models for virtual machine (VM) instances, including on-demand and reserved pricing models. Moreover, the workload of cloud users varies with time and is not known a priori. Therefore, it is challenging for cloud users to determine the optimal cloud resource provisioning. In this paper, we propose a two-phase cloud resource provisioning algorithm. In the first phase, we formulate the resource reservation problem as a two-stage stochastic programming problem, and solve it by the sample average approximation method and the dual decomposition method. In the second phase, we propose a hybrid ARIMA-Kalman model to predict the workload, and determine the number of on-demand instances based on the predicted workload. The effectiveness of the proposed two-phase algorithm is evaluated using a real-world workload trace and Amazon EC2’s pricing models. The simulation results show that the proposed algorithm can significantly reduce the operational cost while guaranteeing the service level agreement (SLA).

1. Introduction

Cloud computing [1] is a new computing paradigm to deliver computing resources as services over the Internet. These services are provided at three different levels: Infrastructure as a Service (IaaS) [2], Platform as a Service (PaaS) [3], and Software as a Service (SaaS) [4]. In this paper, we focus on IaaS. IaaS providers such as Amazon EC2 [5] and Microsoft Azure [6] provide their computing resources to cloud users in the form of VMs. Cloud users can rent VMs from cloud providers on a pay-per-use basis.

Cloud providers usually have different billing cycles and offer different pricing models. Take Amazon EC2 as an example. Amazon EC2 has two billing cycles: per hour billing and per second billing. In this paper, we adopt per hour billing. Amazon EC2 offers three pricing models: (1) On-demand pricing model. On-demand instances let users pay for compute capacity by the hour with no long-term commitments. (2) Reserved pricing model. Users pay an upfront fee (all upfront, partial upfront, and no upfront) to reserve an instance for a 1-year or 3-year term and is then charged a discounted hourly rate for the instance during the reservation period. (3) Spot pricing model. Spot instances allow users to bid on unused EC2 instances and run those instances for as long as their bid exceeds the spot price. Spot instances are charged the spot price which is set by Amazon EC2 and adjusted gradually based on the supply and demand for spot instances. Such diverse pricing models make it challenging for cloud users to determine the optimal cloud resource provisioning.

There have been a lot of studies on cloud resource provisioning, which aim to minimize the resource provisioning cost while satisfying the service requirements. However, most existing studies [711] do not consider the pricing models or only consider the on-demand pricing model. Some recent studies [1216] consider both on-demand and reserved pricing models to reduce the resource provisioning cost. These studies typically use reserved instances to meet the minimum service requirements and use on-demand instances to meet the sudden workload demand.

In this paper, we study the cloud resource provisioning problem. To reduce the resource rental cost, we use both on-demand and reserved instances and propose a two-phase cloud resource provisioning algorithm. In the resource reservation phase, we determine the optimal number of reserved instances to minimize the resource rental cost. In the on-demand resource provisioning phase, on-demand instances are purchased based on the predicted workload to guarantee the SLA. The main contributions of this paper are summarized as follows:(i)We use both on-demand and reserved instances for cloud resource provisioning and propose a two-phase cloud resource provisioning algorithm to reduce the resource rental cost.(ii)In the first phase, we formulate the resource reservation problem as a two-stage stochastic programming problem, and solve it by the sample average approximation method and the dual decomposition method.(iii)In the second phase, we propose a hybrid ARIMA-Kalman model for workload prediction and determine the number of on-demand instances based on the predicted workload.(iv)We conduct extensive experiments to evaluate the effectiveness of the proposed two-phase algorithm using a real-world workload trace and Amazon EC2’s pricing models. The experimental results show that the proposed algorithm can significantly reduce the operational cost while guaranteeing the SLA.

The rest of this paper is organized as follows. Related works are reviewed in Section 2. The problem formulation is given in Section 3. The two-phase cloud resource provisioning algorithm is presented in Section 4 and Section 5. Experimental results are presented in Section 6. Finally, we conclude this paper in Section 7.

In cloud computing, cloud users can reduce the cost and guarantee the QoS requirements through adaptive resource provisioning. Adaptive resource provisioning has been widely studied [711]. In [7], the autoscaling techniques were classified into five categories: static threshold-based rules, reinforcement learning, queuing theory, control theory, and time series analysis. Calheiros et al. [8] proposed a workload prediction model using the ARIMA model and evaluated its impact on cloud applications’ QoS. Islam et al. [9] developed prediction-based resource measurement and provisioning strategies using neural network and linear regression to satisfy upcoming resource demands. To train the neural network, Shah et al. [17] presented a quick Gbest-guided artificial bee colony learning algorithm. Chen et al. [10] proposed an iterative QoS prediction model and a PSO-based runtime decision algorithm to derive a self-adaptive approach for resource allocation in cloud-based software services. Liu et al. [11] presented SPRNT, a reinforcement learning-based aggressive virtualized resource management system for IaaS clouds.

The above works mainly focus on adaptive resource provisioning. However, cloud providers usually offer multiple pricing models: on-demand, reserved, and spot. Cloud users can significantly reduce the cost based on these pricing models. Chaisiri et al. [12] proposed an optimal cloud resource provisioning algorithm by formulating a stochastic programming model in which the demand and price uncertainty is considered. In [13], a two-phase resource provisioning algorithm was presented. In the first phase, the optimal amount of long-term reserved resources was computed by a mathematical formulae. In the second phase, the authors used the Kalman filter to predict resource demand and adaptively changed the subscribed on-demand resources. Niu et al. [14] proposed a semielastic cluster computing model for organizations to reserve and dynamically resize a virtual cloud-based cluster. In [15], a dynamic instance provisioning strategy based on the large deviation principle was proposed to minimize the number of active instances subject to a QoS requirement in terms of the overload probability. Mireslami et al. [16] proposed a two-phase cloud resource allocation algorithm. In the first phase, reserved resources were allocated to meet the minimum QoS requirements. In the second phase, a stochastic optimization approach was proposed to allocate on-demand resources under demand uncertainty.

In this paper, the cloud resource provisioning problem is formulated as a two-stage stochastic programming problem. It can be transformed into a deterministic integer program and solved by exact methods such as branch and bound and cutting plane methods, or heuristic methods such as genetic algorithm, particle swarm optimization, and hybrid algorithms [1820]. Grey [18] presented a hybrid PSO-GA algorithm for solving the various constrained optimization problems. In this approach, PSO is used to explore the solution while GA is being used for updating the solution.

3. Problem Formulation

In this section, we present the model assumptions, including the VM configurations and the pricing models. Based on these assumptions, we present the formulation of the cloud resource provisioning problem. The notations used in this paper are listed in Table 1.

3.1. Cloud Computing Environment

Cloud providers offer multiple types of VMs to cloud users. Let denote the set of VM types, where is the total number of VM types. Each VM type has its own resource configuration and processing capacity. Let denote the processing capacity of a VM instance of type , which is the maximum number of concurrent users or the maximum service request rate that can be handled by a VM instance of type without violating the QoS requirements.

We adopt per hour billing and consider two pricing models: on-demand instance and reserved instances (1-year term, partial upfront). Let denote the hourly usage fee of an on-demand instance of type . Let and denote the one-time upfront payment and the hourly usage fee of a reserved instance of type , respectively. Let be the number of hours in a reservation period. Then, the effective hourly price of a reserved instance of type can be computed as , which is charged for every hour during the reservation period. It is usually assumed that .

3.2. Cloud Resource Provisioning Problem

We consider the cloud resource provisioning problem over a reservation period. Let be the hour index of the reservation period. Let be the workload at time . Let be the reservation decision and be the number of reserved instances of type . Then, the reserved processing capacity is , and the total cost of reserved instances for the reservation period is . For each time , if the workload does not exceed the reserved processing capacity, there will be no need to purchase on-demand instances; otherwise, on-demand instances will be purchased, and the usage cost of on-demand instances can be written aswhere is the number of on-demand instances of type at time .

The resource reservation problem can be formulated aswhere the objective is to minimize the total cost for the reservation period, including the upfront fee and the usage cost of reserved instances, and the usage cost of on-demand instances. This problem depends on the workload over the reservation period, which is not known a priori. We can estimate the probability distribution of the workload based on historical data. Then, the resource reservation problem can be rewritten as

This problem is a two-stage stochastic programming problem, where the objective function is the average cost per hour, and the possible realizations of the workload are called scenarios. The first-stage problem corresponds to the resource reservation problem, where the first-stage decision is the reservation decision. The second-stage problem corresponds to the on-demand resource provisioning problem, where the second-stage decision depends on the realization of the workload.

4. Resource Reservation

In this section, we use the sample average approximation method and the dual decomposition method to solve the resource reservation problem.

4.1. Sample Average Approximation (SAA)

If the number of scenarios is very large, it is difficult to solve (3) directly. The sample average approximation method can be used to reduce the number of scenarios [21]. Since the workload is a one-dimensional random variable, a uniform discretization grid is used to generate a set of scenarios , where is the sample size. Then, problem (3) can be approximated as

Problem (4) is the SAA of problem (3). Problem (4) is also a two-stage stochastic programming problem, which can be transformed into the following deterministic equivalent formulation:

Problem (5) is an integer linear program, which can be solved using a standard branch and bound algorithm.

4.2. Dual Decomposition-Based Branch and Bound (DDBnB)

The standard branch and bound algorithm uses the linear programming relaxation for bounding. In this paper, we use the Lagrangian relaxation obtained by scenario decomposition to improve the bounds [22].

The idea of scenario decomposition is to introduce a copy of the first-stage decision for each scenario. Then, problem (5) can be reformulated aswhere the constraints are called the nonanticipativity constraints. The nonanticipativity constraints have several equivalent expressions. Here, we represent the nonanticipativity constraints by , where is a suitable matrix. By dualizing the nonanticipativity constraints, the Lagrange dual function of problem (6) is defined aswhere is the Lagrange multiplier vector associated with the nonanticipativity constraints. Problem (7) can be decomposed into multiple subproblems according to the scenarios:

Problem (8) is called the scenario subproblem, which is a small integer linear program. The dual problem of problem (6) can be formulated as

Dual problem (9) can be solved by the subgradient method. From the definition of the subgradient, the subgradient of is , where is the first-stage component of the optimal solution of (8) for a given . The iterative formula of the subgradient method is as follows:where is the iteration index and is a positive step size.

Dual problem (9) provides a lower bound for original problem (6). In general, the scenario solutions will not satisfy the nonanticipativity constraints unless the duality gap is zero. In this paper, we present a branch and bound algorithm that uses the Lagrangian relaxation of the nonanticipativity constraints for bounding. To obtain a feasible first-stage solution, we compute the average and round it by some heuristic to obtain an integer solution. The feasible first-stage solution provides an upper bound for problem (6). The branch and bound algorithm is described as follows, where denotes the set of current problems and is a lower bound of:Step 1. Initialization: set and let consist of problem (6).Step 2. Termination: if , then the solution that yielded is optimal.Step 3. Node selection: select and delete a problem from , and solve its Lagrangian dual.Step 4. Bounding: if , go to Step 2 (this step can be carried out as soon as the value of the Lagrangian dual rises above ).(i)The scenario solutions , are identical: let and delete from all problems with . Go to Step 2.(ii)The scenario solutions , differ: compute the average and round it by some heuristic to obtain . If , then let and delete from all problems with .Step 5. Branching: select a component of and add two new problems to obtained from by adding the constraints and , respectively. Go to Step 2.

5. On-Demand Resource Provisioning

On-demand resource provisioning problem (1) is an integer linear program, which can be solved using any standard integer linear programming solver. However, the workload is not known a priori. In this paper, we propose a hybrid ARIMA-Kalman model for workload prediction.

It has been shown in the literature that the workload exhibits strong autocorrelation. Then, the workload can be modeled by an ARIMA model [8, 23]:where , , are the AR coefficients, and are the MA coefficients. Let , and model (11) can be rewritten aswhere for and for . Then, the state-space representation of model (12) can be obtained as [24]where (13) and (14) are the measurement and state equations, is the measurement variable, is the state vector, is the measurement noise with variance , and is the state noise with covariance matrix . The measurement matrix and the state transition matrix are given as

From state-space models (13) and (14), the Kalman prediction equations is obtained as follows [25]:where (16) and (17) are the time and measurement update equations, and are the estimates of and given the observations up to time , is the error covariance matrix of , is the error variance of , and is the Kalman gain.

Let denote the set of parameters in the Kalman prediction equations, which can be estimated by the maximum likelihood method. In this paper, we use the EM algorithm [26] to obtain the maximum likelihood estimates of the parameters. If we could observe the states in addition to the observations , then we would consider as the complete data. Under the Gaussian assumption, the log-likelihood of the complete data can be written as

From (18), if we did have the complete data, it will be straight forward to obtain the maximum likelihood estimate of using multivariate normal theory. However, we cannot observe the states. The EM algorithm is an iterative method for finding the maximum likelihood estimate of based on the incomplete data by successively maximizing the conditional expectation of the complete data log-likelihood. Each iteration of the EM algorithm consists of two steps, the expectation step (E-step) and the maximization step (M-step). In the E-step, the conditional expectation of the complete data log-likelihood is computed given the parameter estimates from the previous iteration:

From (18), we can obtainwhere

is the error covariance of and . In the M-step, (20) is maximized with respect to the parameters and then the updated parameter estimates are obtained as

The flowchart of the EM algorithm is shown in Figure 1.

The one-step-ahead prediction of the workload based on Kalman prediction is given by

For each time , even with a workload prediction method, the underprovisioning problem can occur due to underestimation, which causes the SLA violation. To reduce the SLA violation rate, (23) can be modified as

6. Evaluation

In this section, we conduct extensive experiments to evaluate the effectiveness of the proposed two-phase algorithm based on a real-world workload trace and Amazon EC2’s pricing models.

6.1. Experiment Setup

The workload trace used in the experiments is obtained from a 4-week access log file of the NASA web server [27], as shown in Figure 2. The probability distribution of the workload can be estimated based on the workload trace. We consider four types of VM instances offered by Amazon EC2: small (m1.small), medium (m1.medium), large (m1.large), and extralarge (m1.xlarge) [5]. Table 2 shows the configuration and the pricing models of each VM type. The parameters of the algorithms are set as follows. The sample size of the SAA problem is set to 10. In the subgradient method, we use a diminishing step size where , and repeat the iterations until the stopping criterion is satisfied where . In the EM algorithm, the initial values of the parameters are set according to [25].

6.2. Performance of Resource Reservation Algorithm

We first analyze the impact of resource reservation on the operational cost. Figure 3 shows the operational cost under different resource reservations. We can observe that the operational cost can be significantly reduced by resource reservation, and there is a tradeoff between the on-demand cost and the reservation cost. The optimal resource reservation is with the reserved processing capacity of 275 requests/s, and the optimal operational cost is $3407.9. By combining on-demand and reserved instances, the operational cost can be reduced by 25.58% compared with the pure on-demand strategy.

We compare the accuracy of the uniform discretization grid with that of the Monte Carlo and quasi-Monte Carlo methods [21]. As can be seen from Figure 4(a), the uniform discretization grid is the best among the three methods under the same sample size. We also study the impact of the sample size on the accuracy of the uniform discretization grid. As can be seen from Figure 4(b), the accuracy of the uniform discretization grid becomes higher as the sample size increases, and reaches 98.01% when .

Figure 5 shows the convergence of the dual decomposition-based branch and bound algorithm. It can be seen that the optimal solution can be obtained by the DDBnB algorithm after 9 iterations. Table 3 compares the performance of our resource reservation algorithm based on stochastic programming (RRSP) with two existing algorithms: the RIPAM algorithm considering only medium instance type [15] and the DCRA algorithm [16]. The RRSP algorithm can reduce the operational cost by 24.92%. Our algorithm can achieve 4.14% more cost saving than RIPAM, and 20.84% more cost saving than DCRA.

6.3. Workload Prediction Based on Hybrid ARIMA-Kalman Model

In this subsection, we evaluate the performance of the hybrid ARIMA-Kalman model. The data of the first three weeks are used as the training data and the data of the last week as the test data. Figure 6 shows the prediction results. We can observe that the predicted workload is very close to the actual workload. The prediction accuracy of the hybrid ARIMA-Kalman model is compared with the ARIMA model [8] and the neural network method [9] based on three metrics, mean absolute percentage error (MAPE), root mean square error (RMSE), and mean absolute error (MAE). The ARIMA model has an autoregressive order of 2 and a moving average order of 1. The neural network method uses the backpropagation neural network, the learning rate is set to 0.7, there is only one hidden layer, and the numbers of neurons in the input, hidden, and output layers are 6, 4, and 1, respectively. As can be seen from Table 4, the hybrid ARIMA-Kalman model is better than the other two methods.

Although the predicted workload is very close to the actual workload, the underprovisioning problem can occur due to underestimation of the workload. To reduce the SLA violation rate, modified workload prediction formula (24) is used. Figure 7 shows the impact of the parameter on the SLA violation rate and the on-demand cost. It can be seen that, as the value of increases, the SLA violation rate decreases while the on-demand cost increases.

7. Conclusion

In this paper, we propose a two-phase cloud resource provisioning algorithm for cloud users to reduce the resource rental cost using both on-demand and reserved instances. In the first phase, we formulate the resource reservation problem as a two-stage stochastic programming problem. We use the sample average approximation method to reduce the number of scenarios, and solve the SAA problem by a dual decomposition algorithm with branch and bound to obtain the optimal resource reservation. In the second phase, we propose a hybrid ARIMA-Kalman model for workload prediction and determine the number of on-demand instances based on the predicted workload. The effectiveness of the proposed two-phase algorithm is evaluated based on a real-world workload trace and Amazon EC2’s pricing models. The simulation results show that the proposed algorithm can achieve about 5%–20% more cost saving than existing algorithms while guaranteeing the SLA. In the future, we plan to investigate more pricing models offered by cloud providers such as spot pricing model, and use these pricing models to further reduce the resource rental cost of cloud users.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Science and Technology Program of Nantong (grant no. JC2018025).