Abstract
Electric vehicles (EV) are fast becoming an integral part of our evolving society. There is a growing movement in advanced countries to replace gas-driven vehicles with EVs towards cutting down pollution from emissions. When fully integrated into society, electric vehicles will share from energy available on the grid; therefore, it is important to understand consumption profiles for EVs. In this study, some computation models are developed from predicting day-ahead energy consumption for electric vehicles in the city of Barcelona. Five different machine learning algorithms namely support vector regression (SVR), Gaussian process regression (GPR), artificial neural networks (ANN), decision tree (DT), and ensemble learners were used to train the forecasting models. The hyperparameters for each of the ML algorithms were tuned by Bayesian optimization algorithm. In order to propose efficient features for modeling EV demand, two different model structures were investigated, named Type-I and Type-II model. In the instance of the Type-I model, seven regressors representing the consumption of the previous seven days were considered as input features. The Type-II models considered only the EV consumption on the previous day and on the same day in the previous week. Based on the results in this study, we find that the performance of the Type-II models was as good as the Type-I models across all the algorithms considered although less input features were considered. Overall, the all algorithms employed in this study gave about 75-80% model accuracy based on the performance criterion. The models formulated in this study may prove useful for planning and unit commitment functions in city energy management functions.
1. Introduction
Climate change concerns have stimulated diverse interests in alternative and sustainable sources of energy. The past decade has witnessed significant contributions in research and developments geared towards reduction in the dependence on fossil energy sources and integration of more renewable energy sources into the world’s energy mix. Consequently, the generation of energy from renewable sources has led to innovation and development in everyday machines and devices that consume renewable energy. Since it is conceptualization in the early nineteenth century, electric vehicles (EVs) have grown in prominence and adoption. It is estimated that by 2035, all newly purchased vehicles in the EU will be solely electric, due to the ban imposed on the sale of new petrol and diesel cars the EU parliament [1]. The volume of new EV registration in Europe increased from 3.5% to 11% between 2019 and 2020, while the purchase of electric vans also increased from 1.4% to 2.2% within the same period [2].
In a not-too-distant future, EVs will compete for energy on the grid. The faster the drive for integration of EVs into our society, the higher the energy consumption that will be demanded from grid operators on an hourly-daily time resolution. However, the nature of EV demand can be quite stochastic and therefore present some problems in forecasting. This has driven research interests into developing forecasting models that are capable of predicting consumption. Conversely, due to the current low level of adoption, there is limited available open data to assist the development of efficient EV demand forecasting models.
The development of accurate models for EV demand could offer significant economic benefits by minimizing unit commitment costs through providing reliable forecasts [3–7]. In [8], the authors studied day-ahead charging demand of electric vehicles using a deep-neural network. In their study, they show that the performance measures of the neural network forecaster, namely RMSE and MAE, improved by 28.8% and 19.22%, respectively, due to the inclusion of weather and calendar features in the forecasting problem formulation. An autoregressive integrated moving average (ARIMA) model was proposed in [9] for forecasting charging demand and conventional electrical load of EV charging stations. The proposed forecaster considers daily driving patterns and distances as features to generate the expected charging load profiles. According to the authors, the performance of the ARIMA model was improved by tuning the order of the integrated and autoregressive parameters such that the mean squared error performance criterion is minimized. A Bayesian inference method with convolution was proposed by [10] for 24 hours ahead EV charging demand. The performance of the proposed method was compared with linear regression and was found to have lower MAPE and RMSE values. Different classical and machine learning-based algorithms were considered in [11] for forecasting EV demand in Korea using datasets from 2018 to 2019. In their study, they considered classical statistical methods such as the trigonometric, Box-Cox, autoregressive-moving-average (ARMA), and autoregressive integrated moving average (ARIMA), as well as machine learning methods including artificial neural networks and long short-term memory (LSTM) networks. The study specifically evaluated the influence of exogenous variables in macro- and microscale geographical areas.
A decision tree-based model was proposed in [12] for estimating charge demand for different classes of EVs in South Korea. The proposed model incorporates traffic distribution data and weather effects as features in the forecasting model. The study built different cluster models, using probability density functions, for electric cars and buses based on traffic patterns. Four different prediction methods, namely, Time Weighted Dot Product Nearest Neighbor, Modified Pattern Sequence Forecasting, support vector regression, and random forests were applied to charging record and station record datasets from UCLA in [13]. It was the objective of the authors to determine the best approach between charging record (consumer-perspective) or station record (charging outlets/stations) to forecast EV charging load. The authors conclude that both approaches yielded comparative prediction errors. Support vector regression was utilized by [14] to build an EV forecaster. The features considered in the study included the following: historical charging data, number of EVs, weather information, week properties, and holiday properties. Short-term EV demand forecasting model was investigated by [15] using a convolutional neural network (CNN) optimized by a niche immunity lion algorithm. The niche immunity lion algorithm was used in their study to optimize the weights and thresholds of the CNN.
Contrary to the aforementioned studies, this study was interested in the development of multiple computational models for EV demand forecasting towards suggesting the best model with the least prediction errors on limited historical data. Furthermore, it was of paramount interest to test whether increasing the number of regressors (features) in time series forecasting for EV demand yielded better model prediction capabilities. The main contributions of this study are summarized as follows: (i)We develop computational models for predicting day-ahead EV consumption in the Municipal Area of Barcelona (Area Metropolitana de Barcelona, AMB), Spain(ii)Five machine learning (ML) algorithms were investigated in this study, namely artificial neural networks (ANN), support vector regression (SVR), Gaussian process regression (GPR), decision tree (DT), and ensemble learners (EL).(iii)We investigate two main model structures each differing in number of features considered. To forecast the demand on a given day , the Type-I model considered total EV energy demand in the past seven days, that is, , while the Type-II model considered only two features, total EV energy in the previous day and on the same day in the previous week(iv)The parameters of each of the ML algorithms were optimized using Bayesian optimization to arrive at the optimal set of hyperparamters that give optimal model results(v)Performance analyses were carried out for each model cluster using the regression coefficient and root mean squared error (RMSE) values(vi)It was determined that the Type-II models performed just as well as the Type-I models with 75-80% model prediction accuracy across all five ML algorithms considered which suggests that the previous day consumption and consumption on the same day in the previous week represent important features in modeling consumption profiles of EV users in Barcelona
2. Datasets
The dataset considered in this study was obtained from [16] and represents the charge demand from EV users in the city of Barcelona covering a full one year period in 2019. The dataset consists of records from 21 stations. Each of the stations and their average daily energy consumption are summarized in Table 1. The first 11 stations in Table 1 represent slow charging stations with two Schuko 3 kW charging points (CPs). The last 10 stations represent fast charging stations each with the following charging points: one Mennekes 43 kW CP, one Combo CCS 55 kW CP, and one CHAdeMO 55 kW CP. Figure 1 summarizes the average demand for each day type, that is, Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday which shows that bulk of the EV energy consumption occurs during weekdays than weekends.

3. Methodology
Two model structures were formulated to investigate the consumption patterns of EV users in the city of Barcelona. Type-I model uses seven predictors representing the consumption of the past week, that is, , while Type-II model considers just two features, denoting the consumption of the previous day and the consumption on the same day last week. Both model types were trained with five machine learning algorithms: Decision tree, ensemble learners, support vector regression, Gaussian process regression, and artificial neural networks. For each of the algorithms, we carry out hyperparameter optimization using Bayesian optimization to select the best hyperparameters. The features of both model types were extracted in the same manner from the original data and split into training and testing set using a 70 : 30 split ratio. The following subsections discuss each of the machine learning algorithms considered in the study.
3.1. Decision Tree
Decision trees are graph-based data mining algorithms for regression and classification problems. The earlier decision tree algorithms such as the Iterative Dichotomiser III [17] and C4.5 [18, 19] were proposed for classification-based problems and introduction of CART [20] which is capable of solving regression problems. In a decision tree algorithm, each interior node represents an input variable, and each terminal node corresponds to a target variable. The regression (decision) trees are grown to minimize errors by dividing the input space at each iteration of the algorithm. The estimated output at each terminal node is computed by where represents the leaf node at and is the total number of samples at that node. An error criterion is used to guide the splitting decision. For instance, [] considers the following least squares deviation where denotes an impurity measure. The splitting criterion is then obtained as follows.
and represent child nodes to the left and right of the parent node , respectively, and and are data proportions of samples allocated to left and right children nodes. Suppose the split rule is generated based on numeric or ordinal variable, the resulting children node is two, and the parent node is decomposed into two subsets, and , where , denotes the split point and selected attribute, respectively. Important hyperparameters considered in this study which could affect the performance of the decision tree algorithm include maximum number of splits, minimum leaf size, and number of variables to sample. The maximum number of splits parameter limits the total number of splits possible from the root node. The number of variables to sample parameter determines the number of regressors selected at each random split. The minimum leaf size hyperparameter defines the limit for a node split in a child node when the number of observations in that node is less than the minimum leaf size.
3.2. Ensemble Learners
Voting ensemble or ensemble learning refers to the combination of multiple algorithms to solve a given problem. This approach has been proven to optimize performance, reduce instability, and handle complex datasets. Different ensemble-based techniques have been reported in literature such as voting-based ensemble [21], ensemble of online sequential extreme learning machine (EOS-ELM) [22], and weighted voting in ELM [23]. Voting-based ELM considers multiple ELMs trained on the same dataset, each having the same hidden node and activation function in each hidden node. A majority-based voting is then used to determine the final output of the ensemble network. In this study, we consider ensemble learning algorithm consisting of boosted or bagged decision trees. The optimized hyperparameters of the ensemble learning method considered in this study include the following: method (LSBoost or Bag), number of learning cycles, learn rate, minimum leaf size, and number of variables to sample. The minimum leaf size, number of learning cycles, and number of variables to sample are particular to the decision tree algorithm and have been discussed in the preceding section. The method hyperparameter specifies the type of ensemble method to use. Boosting methods constructs shallow decision trees and are faster to converge than Bag method which constructs deep trees. The learn rate hyperparameter guides the speed of convergence of the LSBoost algorithm.
3.3. Support Vector Regression
Support vector machine (SVM) originally proposed by [24] is based on the structural risk minimization principle. Given a dataset in a binary classification task, the classical SVM algorithm attempts to find a hyperplane that minimizes the classification error. Support vector regression (SVR) algorithms are extensions of the SVM algorithm for regression-based problems. The classical SVR algorithm is based on a -loss function which attempts to compute a hyperplane such that the error between the predicted observations , and the actual observations based on a given feature space is minimized to have at least -deviation. Different variants of the SVR algorithms have been proposed such as linear, kernel, V, and twin-SVR algorithms. A simple mathematical basis of the SVR algorithm is summarized as follows. Consider a simple regression problem with n observations arranged in the tuple , where and denote features (or predictors or regressors) and outputs, respectively. Suppose we seek to find a function defined by
such that the error between the observed outputs and the predicted outputs is minimized over training and validation datasets. The problem is thus cast into an optimization problem as follows.
Alternatively, we may also consider the following loss functions [Reference]
For practical considerations, the presented mathematical formulation of the SVR algorithm is too naïve as it does not account for model inaccuracies. To account for model errors, slack variables , are introduced, to reconstruct the minimization problem as follows:
In (8), a positive regularization parameter has been included to balance between optimizing the flatness of the function and minimizing the prediction errors. The kernel SVR is arguably the most popular SVR algorithm in literature due to its ability to handle highly nonlinear problems by transforming the input space into high-dimensional kernel space via some kernel functions. For a given kernel function , the kernel SVR problem is cast as follows:
The performance of the support vector regression algorithm may be influenced by the choice of hyperparameters. Some tunable SVR hyperparameters include the following: box constraint, kernel scale, epsilon, kernel function, and polynomial order. The box constraint hyperparameter controls the costs associated to misclassified points when the data is not linearly separable. The kernel function hyperparameter determines the type of kernel function to interpret the features. Common kernel functions include linear, Gaussian, radial basis, polynomial, and sigmoidal functions. The margin of tolerance is measured by the epsilon hyperparameter. The smaller the value of , the smaller the error tolerance. The polynomial order hyperparameter determines the order of the polynomial kernel function. The kernel scale hyperparameter is a scaling value used to scale the features before computation of the Gram matrix.
3.4. Artificial Neural Networks
Artificial neural networks (ANN) are bioinspired information processing units that mimic the processing capabilities of the human brain to model complex and often highly nonlinear relationships between input and output data. ANN was originally developed by [25], in their attempt to derive functional models for biosystems using simple logical operators. The study and development of artificial neural networks has since then evolved with several model structure and algorithms proposed for solving different black-box modeling problems. The simplest neural network consists typically of a neuron which forms its basic functional units. The mathematical expression of a simple neuron unit is given by where and are known as weights and biases, is the output of the neuron, and is the number of data observations. The process of arriving at black-box models for complex and/or nonlinear processes is referred to as learning. In the learning phase, the neural networks adopt some learning algorithms to determine appropriate weights. A loss function usually in the form of mean squared error criterion between the actual and predicted output is defined to guide the convergence of the learning algorithms to optimal values of the weights and biases that fits the modeling problem perfectly. Different forms of learning such as supervised, unsupervised, and reinforcement learning have been proposed for neural network modeling problems. Learning algorithms such as gradient descent, Levenberg Marquardt, newton, quasi-Newton, and conjugate gradient have been developed for training neural network models. ANNs can be classified based on the network structure into single-layer feed-forward networks, multilayer feed-forward networks, single-layer recurrent network, multilayer recurrent network, and single node with self-feedback. ANNs have the capabilities to handle both classification and regression problems. In this study, we employ a feed-forward neural network model structure with a limited memory Broyden-Fletcher-Goldfarb-Shanno quasi-Newton (LBGBS) training algorithm to minimize the mean squared error criterion. The layer structure of the neural network consists of five layers, namely the input layer, fully connected layer, ReLU activation layer, another fully connected layer, and an output layer. The main hyperparameters optimized for the ANN algorithms include the activation function, the number of hidden layers, and the number of neurons in each hidden layer.
3.5. Gaussian Process Regression
The approach for modeling complex and highly nonlinear processes via Gaussian process regression (GPR) is based on statistical Gaussian processes defined by a mean and covariance function. Consider a process to be modeled which has the pair of input-output relationships defined by , where , represent features (input) and outputs, respectively, and is the number of observations. The modeling in GPR begins with describing a predictor for the process output as follows: where ) is a noise term with zero mean and covariance .The function is also considered to be random variable and can be described by a statistical distribution, in this case, a Gaussian distribution as follows: where and are mean and kernel (covariance) functions defined as follows:
The kernel functions are used to describe the dependence of the functions at different points and and are critical to the performance of the GPR algorithm in a modeling task. A popular choice of kernel function is the Bayesian kernel function defined as where and are length scale and signal variance hyperparameters, respectively, and are used to increase or decrease prior correlation between points. The major hyperparameters optimized for the GPR algorithm include basis function, sigma, kernel function, and kernel scale. The options of GPR kernel functions include ardexponential, ardmatern32, ardmatern52, ardrationalquadratic, ardsquaredexponential, exponential, matern32, matern52, rationalquadratic, and squaredexponential. The search space of the basis function includes none, constant, linear, and quadratic.
4. Results and Discussion
This section presents the results obtained from both the Types-I and II demand forecasting models. The performances of each computational model are evaluated using root mean squared error (RMSE) and correlation coefficient defined by (16) and (17), respectively. Table 2 summarizes the performance values for all the models considered.
4.1. Support Vector Regression
The optimized hyperparameters of the SVR model are reported in Table 3. The Type-I SVR model optimization converged to a linear kernel function, while for the Type-II model, the optimizer converged to a Gaussian kernel function. The Type-I SVR model converged to a minimum objective of 10.32, while the Type-II SVR model converged to a minimum objective of 10.36. Figures 2 and 3 show the training and testing regression line plots for both Type-I and Type-II models. The evaluation of the Type-I SVR algorithm on the training dataset yielded and RMSE values of 0.80 and 172.44, respectively, while the Type-II model yielded and RMSE values of 0.80 and 174.11 on the training set. Likewise, the Type-I model gave and RMSE values of 0.79 and 189.01, respectively. On the testing set for the Type-II model, we recorded and RMSE values of 0.80 and 184.42, respectively.


4.2. Gaussian Process Regression
Table 4 presents the results of the optimized GPR hyperparameters. The Type-I model converged to a linear basis function and ardexponential kernel function, while the Type-II model converged to a linear basis function and an exponential kernel function. The minimum objective function values for both Type-I and Type-II models are 10.35 and 10.35, respectively. In Figures 4 and 5, the regression plots of Type-I and Type-II models are presented. In both Type-I and Type-II model formulations, and RMSE values of 1 and 0.05 were recorded on the training dataset. Conversely, the testing performance for the Type-I model gave and RMSE values of 0.79 and 189.78, respectively. Likewise, and RMSE values of 0.78 and 190.89 were recorded on the training set for Type-II model.


4.3. Artificial Neural Networks
In Table 5, we summarize the results of the hyperparameter optimization of the ANN algorithm. In the case of Type-I model, the optimization algorithm converged to one hidden layer with 297 neurons. Conversely, in the case of Type-II model the optimization algorithm converged to three hidden layers with 2, 15, and 2 neurons, respectively. In Figures 6 and 7, the regression plots of Type-I and Type-II models are presented. The minimum objective function value recorded for Type-I and Type-II model are 10.29 and 10.34, respectively. On the training dataset, we recorded for the Type-I model and RMSE values of 0.80 and 171.36, respectively. With the testing dataset, the Type-I model yielded and RMSE values of 0.79 and 187.72, respectively. The Type-II model yielded and RMSE values of 0.80 and 173.67, respectively, on the training set and and RMSE values of 0.80 and 185.12, respectively, on the testing set.


4.4. Decision Tree
The hyperparameters of the decision tree algorithm are summarized in Table 6. The optimization of the DT-I model converged to minimum leaf size(MLS) of four and maximum number of splits (MNS) of seven. Conversely, the DT-II model has a higher minimum leaf size of 27 and maximum number of splits of 5. The optimization of the Type-I and Type-II models yielded converged to minimum objective values of 10.41 and 10.4, respectively. In Figures 8 and 9, the regression plots of Type-I and Type-II models are presented. In both Type-I and Type-II models, the all the features (variables) were sampled. For the Type-I model, and RMSE values of 0.83 and 161.96 were recorded on the training set, while and RMSE values of 0.66 and 234.55 were recorded on the testing set. Conversely, the Type-II model yielded and RMSE values of 0.81 and 170.15, respectively, on the training set. On the testing set, and RMSE values of 0.75 and 202.87 were recorded, respectively, for the Type-II model.


4.5. Ensemble Learners
Table 7 summarizes the hyperparameters of the ensemble learning algorithms for both Type-I and Type-II models. The optimization algorithm converged to minimum objective function values of 10.34 and 10.36 after 100 iterations. The Type-I model converged to the Bag method, while the Type-II model converged to the LSBoost method. In Figures 10 and 11, the regression plots of Type-I and Type-II models are presented. Compared with the case of decision tree algorithm, the Type-I Ensemble learner model uses a subset of the total number of features, that is, the best minimum objective function value in this case was obtained by using just five out of the seven features. The number of learning cycles (NLC) in the Type-II model was higher compared to the Type-I model. In terms of model performances, in the case of the Type-I model, the ensemble learners gave and RMSE values of 0.89 and 135.11, respectively, on the training set, and 0.80 and 183.89 on the testing set, respectively. In the instance of the Type-II model, and RMSE values of 0.83 and 160.86, respectively, were recorded on the training set. On the testing set, we obtained and RMSE values of 0.76 and 198.53, respectively, for the Type-II model.


5. Conclusions
In this study, we have developed different computational models and carried out comparative analysis between the developed models using and RMSE performance criterion. The summary of the developments in this study is given as follows: (i)Five machine learning-based algorithms were employed to develop a day-ahead electricity demand models representing electric vehicles in Barcelona(ii)Two main model classes were formulated in this study; the Type-I model was formulated by considering demand from the previous seven days as features, that is ; the Type-II model was formulated by considering only two features namely, the consumption of the previous day and the consumption of the same day in the previous week(iii)Each of the machine learning algorithms were used to train the proposed model. We compared the performances of Types-I and II models for all algorithms and find that although the Type-II model has less features, it gives almost the same results as the Type-I model, implying that Type-II models has less complexity and computational times
In the case of the Type-I models, the performance of the machine learning algorithms can be summarized from best to worst as follows: .
For the Type-II models, the performances of the machine learning algorithms can be summarized as follows from best to worst as follows: .
Data Availability
Data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the Interdisciplinary Research Center for Renewable Energy and Power Systems at King Fahd University of Petroleum and Minerals (KFUPM) under Project INRE2221.