Abstract

Sugar price forecasting has attracted extensive attention from policymakers due to its significant impact on people’s daily lives and markets. In this paper, we present a novel hybrid deep learning model that utilizes the merit of a time series decomposition technology empirical mode decomposition (EMD) and a hyperparameter optimization algorithm Tree of Parzen Estimators (TPEs) for sugar price forecasting. The effectiveness of the proposed model was implemented in a case study with the price of London Sugar Futures. Two experiments are conducted to verify the superiority of the EMD and TPE. Moreover, the specific effects of EMD and TPE are analyzed by the DM test and improvement percentage. Finally, empirical results demonstrate that the proposed hybrid model outperforms other models.

1. Introduction

Sugar is an important food commodity around the world, and the fluctuations of food prices have a huge impact on people’s daily lives due to its impact on overall inflation dynamics of many countries [1, 2]. Therefore, it is essential to forecast the sugar price accurately.

Forecasting of sugar prices has attracted a lot interest of researchers for several decades, and it can be divided into statistical methods and machine learning methods. The statistical methods have the advantages of low complexity and fast calculation speed [3]. In 1975, Meyer and Kim [4] applied the autoregressive integrated moving average (ARIMA) method in sugar price forecasting. However, the ARIMA requires the time series data to be stable or stable after being differentiated, which might limit the application of this method. In 2009, Xu et al. [5] used a neural network with multiple fully connected layers for sugar price forecasting using a Chinese database. In 2011, Ribeiro and Oliveira [6] introduced a hybrid model built upon artificial neural networks (ANNs) and Kalman filter. In 2019, Silva1 et al. [7] investigated ANNs, extreme learning machines (ELMs), and echo state networks (ESNs) for sugar price forecasting. However, one limitation of abovementioned three methods is they do not optimize the hyperparameter of neural networks. Hyperparameter optimization is a commonly used strategy in machine learning area [8] especially in time series forecasting [3] to improve the performance of machine learning models. This is largely due to the explosion in the field of machine learning in recent years [9] and makes some very common technologies appeared in recent years in the field of machine learning, such as SGD [10], and Adam [11] are proposed after the year of 2014, while most sugar price forecasting literatures are published before 2014. On the other hand, the nonstationarity and nonlinearity of sugar prices [12] make it harder to accurately predict the future sugar price. In order to handle nonstationarity features, the inputs for machine learning models need to be properly preprocessed [13]. Therefore, some multiresolution analysis techniques are widely used in many forecasting problems [14, 15]. Conventionally, the discrete wavelet transformation (DWT) was used [16, 17]. Hajiabotorabi et al. [18] improved the recurrent neural network (RNN) with the multiresolution based on B-spline wavelet produced by an efficient DWT. Yong and Awang [19] used DWT for improving the forecast accuracy. However, the DWT generally requires a lengthy trial and error process [20]. Moreover, the empirical mode decomposition [21] (EMD) multiresolution technique is introduced to time series, which provides self-adaptability. EMD extracts the salient features via the temporal local decomposition method and isolates these significant features into subseries that represent the physical structure of the time series [22]. The EMD-based machine learning models have been adopted in time series forecasting. Ali and Prasad [20] predicted the significant wave height by ELM and improved complete ensemble EMD. Bedi and Toshniwal [23] adopted the EMD for electricity demand forecasting. This broadly adapted technique can boost forecasting performance.

To this end, in this paper, to investigate the power of hyperparameter optimization and multiresolution analysis in sugar price forecasting, we propose a hybrid deep learning model for sugar price forecasting. The model uses Tree of Parzen Estimators (TPEs) [24] to optimize long short-term memory (LSTM) networks [25]. A time series decomposition technique named empirical mode decomposition (EMD) is used to decompose the sugar price to extract the salient features. The effectiveness of the proposed approach is tested at the daily sugar price of London Sugar Futures. To fairly compare with the mainstream methods for sugar price forecasting, we build the deep neuron networks (DNNs) with multiple fully connected layers which is equal to models in [57] in the machine learning field and the ARIMA compared with [4]. The results are compared against other machine learning algorithms such as the support vector regression (SVR) machine [15, 2629], the DNN, and traditional time series model ARIMA.

The rest of this paper is organized as follows: Section 2 describes the theoretical background, such as the LSTM, EMD, and TPE. Section 3 describes the proposed hybrid model in detail. Section 4 provides details of experiments and evaluations. Section 5 shows the discussion of experimental results, and Section 6 concludes the paper and points out possible future work.

2. Theoretical Background

2.1. Long Short-Term Memory (LSTM)

The LSTM neural network is heavily used as a basic building block in the modern deep learning-based time series forecasting model [30], which is an improved version of the recurrent neural network (RNN) and mainly solves the problem of gradient vanishing by its internal memory unit and gate mechanism. It can make the network memorize for a longer time and make the network more reliable. It was proposed by Meyer and Kim [4] in 1997. It solves problems that RNN cannot learn the long-term dependence of time series data. It has been widely used in the fields of sentiment analysis [31], speech recognition [32], early crop classification [33], and so on and has achieved satisfactory results. The key mathematical equation of the LSTM model is as follows:where , , , and are the output value of the forget gate, update gate, output gate, and input gate, respectively. Moreover, and denote the product operation and the network parameters; are the bias vectors; is the sigmoid activation function; and is the memory cell. The former LSTM output value and the input data are the inputs of the four gates.

2.2. Empirical Mode Decomposition (EMD)

Empirical mode decomposition (EMD) [21] is a time series decomposition technique, and it was proved to be effective in time series forecasting [20]. Considering the nonlinearity and complexity of sugar price sequences, accurately capturing sugar price characteristics will be a difficult task. Thus, the time series decomposition strategy EMD is adopted to conduct a decomposition in terms of the original sugar price sequences. The procedure of EMD technology is described as follows.(1)Identify all local minima () and local maxima () in sugar price sequences , t = 1, 2, 3,…,T(2)Connect all and to form upper envelopes () and lower envelopes ()(3)Compute the average (4)Extract the intrinsic mode functions (5)Iterate on the residual

2.3. Tree of Parzen Estimators (TPEs)

As stated by James et al. [24], TPE is a global optimization algorithm based on a sequence model. The algorithm uses a probabilistic model to model the loss function and make informed guesses about the specified number of iterations to find the best hyperparameters. When optimizing multiple hyperparameters, this algorithm has shown performance over grid search and random search, especially for deep learning models that usually have more hyperparameters than traditional machine learning models [34].

TPE uses the Bayes rule, and the probabilistic model , and is broken down into and , such thatwhere means that one distributions for the hyperparameter where objective function value is less than the threshold and means that another one distribution for the hyperparameter where objective function value is larger than the threshold.

The expected improvement (EI) metric is used to identify which hyperparameters to be chosen based on the probabilistic model. Given some set of hyperparameters and a threshold value for the objective function, , the EI is given by

When EI is positive, this means that the hyperparameter set is expected to obtain an improvement over the threshold . Therefore, the working principle of TPE is to extract sample hyperparameters from , evaluate them according to , and then return the set that gives the best EI value.

3. Materials and the Proposed Hybrid Model

3.1. The Proposed LSTM Model

In this paper, for a fair comparison with the mainstream sugar price forecasting model, we proposed two-layer LSTM model which is illustrated in Figure 1, and the DNN model used in the three sugar price forecasting literatures will be compared with the proposed LSTM model in the same network structure. The performance of TPE and EMD will validate experimentally via comparing LSTM and DNN.

3.2. Hyperparameter Optimization

For the deep learning models, the hyperparameters are model parameters that are defined in advance before training [35]. There are several hyperparameters used in this paper:

means the number of neurons of hidden layer. As shown in Figure 1, the deep learning models usually contain multiple hidden layers and have several neurons in each hidden layer.

means the activation function of hidden layer. The activation function is a function that runs on the neurons of the neural network and is responsible for mapping the input of the neuron to the output [36]. Each neuron contains an input, output, weight, and processing unit. The output signal of the neuron is obtained after processing by the activation function. tanh, ReLU, and LeakyReLU activation functions [37] are used in this paper.

Dropout Rate. Dropout is a very useful and successful technique that can effectively control the overfitting problem [38]. Generally speaking, dropout will randomly delete some neurons with the probability of dropout rate to train different neural network architectures on different batches. The dropout rate is a real number between 0 and 1.

Optimizer. In deep learning, the loss defines the performance of the model. The loss is used to train the network to make it perform better. Essentially, a lower loss means that the model will perform better. A process of minimizing (or maximizing) any mathematical expression is called optimization, and the optimizer is used to minimize loss. The rmsprop [39], Kingma and Jimmy [11], and sgd [40] optimizers are used in this paper.

Batch Size. The deep learning model updates its parameters for each minibatch of training datasets with a batch size [41]. For the same number of epochs, the number of batches required for a large batch size is reduced, so training time can be reduced. However, within a certain range, increasing batch size helps the stability of convergence; however, as the batch size increases, the performance of the model will decrease [10]. Therefore, batch size is an important hyperparameter.

Learning Rate. Choosing the optimal learning rate is important because it determines whether the neural network can converge to the global minimum. Choosing a higher learning rate may bring undesirable consequences on the loss, so that the neural network may never reach the global minimum, because the neural network is likely to skip it. Choosing a smaller learning rate will help the neural network converge to the global minimum, but it will take a lot of time because it only makes very few adjustments to the weight of the network and more time need to be spent to train the neural network. A smaller learning rate is also more likely to trap the neural network in a local minimum because the smaller learning rate is relatively hard to jump out of the local minimum. Therefore, we must be very careful when setting the learning rate.

3.3. Proposed Hybrid Forecasting Model

Figure 2 illustrates the whole process of the proposed hybrid deep learning LSTM neural network model. The model is based on EMD technique and TPE algorithm. The modeling includes three steps as follows.Step 1. Sugar prices sequence is decomposed by EMD and forms a set of IMFs.Step 2. The lagged values of sugar price sequences and IMFs are used as input of the deep learning model. The TPE algorithm is applied to optimize the hyperparameters of the LSTM model, then the LSTM model with the best hyperparameter combination is obtained, and then its forecasting performance is tested on the test set.Step 3. The trained model can be deployed to real world sugar price forecasting for police makers.

4. Experiments and Evaluations

In this section, sugar price sequences from the market are used to evaluate the performance of the proposed hybrid deep learning model via performing two distinct tests. All the tests are run on Ubuntu 18.04 operation system with Intel Xeon (R) E5-2650 v4 CPU, GeForce GTX 1080 GPU, and 32 GB RAM. Furthermore, to avoid influence of random factor, each method is run 20 times, and the averaging value is used as final results.

4.1. Data Set Description

The dataset used in this paper was the price of London Sugar Futures from April 2010 to May 2020. Data are fetched from the investing.com. Table 1 shows the statistic analysis of this dataset.

To implement the different experiments, we divide the data into three sets: training set, validation set, and testing set. The training set is used to train different models. The validation set is used to select the optimal hyperparameters, and the testing set is used to compare the models. The details of those three sets are described in Table 2.

4.2. Performance Evaluation

To assess the superiority of the proposed hybrid deep learning model, the mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean square error (RMSE) are employed for forecasting one day ahead sugar price. The formulas of MAE, MAPE, and RMSE are as follows:where is the number of prediction datasets, is the real value of feed valve opening, and is the prediction value.

4.3. Experiment I

In this section, the dataset from London Sugar Futures is used to verify the superiority of time series decomposition technique EMD and the hyperparameters optimization algorithm TPE, respectively. Three models such as the model used EMD and TPE combining LSTM (TPE-EMD-LSTM), the model used EMD combining LSTM (EMD-LSTM), and the single LSTM model are put together for comparison. In order to have a fair comparison, those models are using the same structure which is shown in Figure 3.

For the LSTM model, we set 100 neurons in each LSTM layer, and with ReLU activation function, the dropout rate is 0.5 of dropout layer. The EMD-LSTM model has the same hyperparameter with the LSTM model. After optimization, the obtained optimal hyperparameters of TPE-EMD-LSTM are summarized in Table 3. Moreover, the RMSE, MAE, and MAPE are used to test the performance of the TPE-EMD-LSTM and other compared models. The results are shown in Table 4. The TPE-EMD-LSTM model attained 2.415 of MAE, 0.682 of MAPE, and 2.969 of RMSE. These values are the smallest among the three models, indicating the TPE-EMD-LSTM model surpasses the LSTM and EMD-LSTM model. Figure 4 shows the IMFs after decomposition. The performance of the EMD-LSTM is better than that of the LSTM in terms of achieving smaller MAE, MAPE, and RMSE, further ascertaining the performance of EMD to handle nonstationarity features in sugar price series.

4.4. Experiment II

In order to further explore the effectiveness of TPE-EMD-LSTM, experiment II compares the hybrid deep learning model with two categories of forecasting models. For the first category, comparisons are made across some basic single models, including the traditional statistical model ARIMA, the well-known support vector regression (SVR) model, and some deep learning models, such as the DNN model, and the selected single model LSTM, and the proposed hybrid model. The DNN model has the same structure of the LSTM model, which is 100 neurons in each DNN layer and with ReLU activation function, and the dropout rate is 0.5 of dropout layer. For the second category, in order to fully test the hybrid model, two other prediction models are applied, including TPE-EMD-DNN and EMD-DNN. After optimization, the obtained optimal hyperparameters of TPE-EMD-DNN are summarized in Table 5.

The performance evaluation metrics for this test are listed in Table 6. The traditional time series model ARIMA has achieved considerable results with MAE value of 4.135, MAPE value of 1.189, and RMSE value of 5.747, which outperform the DNN and EMD-DNN model by a large margin. However, after optimization, the TPE-EMD-DNN model performs the best in this section with MAE value of 2.648, MAPE value of 0.748, and RMSE value of 3.193. It demonstrated the effectiveness of the TPE algorithm. Moreover, the TPE-EMD-DNN model shows a significant improvement over the EMD-DNN; this not only shows the universality of TPE (effective for both LSTM and DNNs) but also shows its effectiveness.

5. Discussion

In order to evaluate the performance of the proposed hybrid deep learning model more comprehensively and find methods to improve forecasting capabilities and accuracy, followed by [15, 28, 42, 43], we perform the Diebold–Mariano (DM) test [44] and improvement percentage.

5.1. DM Test

The DM test is a method for making a comparison between the forecasting models and determines whether forecasts are significantly different. The DM test is described as follows:where is the consistent estimate of spectral density of loss-differential, is the mean of the loss-differential between two forecasts, and is the length of forecasting time series. The results of the DM test between our proposed TPE-EMD-LSTM and other models are shown in Table 7.

According to Table 7, the DM values range from 4.2 to 30.4, all far above 2.58, the upper boundary of the 1% significance level. That is to say, there is a significant difference between our proposed TPE-EMD-LSTM model and other models. This means that using the proposed TPE-EMD-LSTM model will ultimately achieve the most important forecasting.

5.2. Improvement Percentage

As discussed in the DM test section, the TPE-EMD-LSTM model does significantly better than all other models. However, the details of forecasting performance improvement are not clear. Therefore, in this section, we apply three evaluation matrices (, , ) to discuss the superiority of TPE and EMD more specifically. , , and represent the improvement percentages of MAE, RMSE, and MAPE, respectively. They are defined as follows:where , , and are the MAE, RMSE, and MAPE values of the compared model and , , and are the MAE, RMSE, and MAPE values of the model being compared. For example, for EMD-DNN vs. DNN, shows the improvement percentages of EMD-DNN compared with DNN; if the value of is positive, it means that the EMD-DNN surpasses DNN in terms of MAE, and vice versa.

Two comparisons are conducted. Firstly, EMD-DNN and EMD-LSTM are compared with DNN and LSTM, respectively. Secondly, TPE-EMD-DNN and TPE-EMD-LSTM are compared with EMD-DNN and EMD-LSTM, respectively. Moreover, we found that the traditional time series forecasting model ARIMA still has strong performance on sugar price forecasting. Therefore, ARIMA is compared with LSTM and DNN to discuss the traditional time series model and deep learning based model. Table 8 summarizes the improvement percentage results of different models.

First of all, compared with DNN, it should be noted that EMD is more effective for LSTM network. EMD-LSTM vs. LSTM shows 22.761, 24.722, and 37.401 of , , and , respectively. However, the EMD-DNN vs. DNN only shows improvement in , and it indicates that the DNN is not as good as the LSTM network in capture EMD features for time series prediction. For TPE optimization, it shows a huge improvement for EMD-DNN and EMD-LSTM. The TPE optimization enhances the forecasting precision to a great extent as the improvement percentages relative to the contrasted models are prominent. From the comparison of TPE-EMD-LSTM with TPE-EMD-DNN, it can be seen that even if TPE optimization brings a huge improvement to the DNN, the LSTM network is still better than the DNN. Finally, ARIMA still outperforms DNN and LSTM, which shows the effectiveness of the traditional time series model and can be found in many time series applications [30]. However, it also should be noticed that after hyperparameter optimization, the deep learning-based model shows significant improvement in forecasting accuracy and surpassing the ARIMA model.

In the next step, ensemble forecasting will be investigated as it was surged recent years and shows state of the art performance on many time series forecasting task [33]. Also, in order to verify its generalizability and robustness, the proposed model needs to be applied to predict other food commodities.

6. Conclusions

Sugar price forecasting plays a vital role in policy making of sugar industries. In order to accurately predict sugar price, a hybrid deep learning model that utilizes the merit of time series decomposition technology and a hyperparameter optimization algorithm is proposed. This enhances forecasting performance of the proposed model compared with all other models. A large number of experiments have been conducted to prove the effectiveness of TPE and EMD. Moreover, DM tests are conducted to find improvement percentage and to reveal their specific effects. The DM values from Diebold–Mariano tests between our proposed TPE-EMD-LSTM model and other models range from 4.2 to 30.4, all far above 2.58, the upper boundary of the 1% significance level, indicating huge significance between the compared models. The improvement percentage of the proposed TPE-EMD-LSTM model to the other three models (EMD-DNN, EMD-LSTM, and TPE-EMD-DNN) in the values of IPRMSE is 78, 42, and 7, indicating a significant improvement. This enhanced forecasting performance will contribute to the sugar factory’s recruitment plan for new employees and the sugarcane farmers’ plan for planting sugarcane. After finishing this sugar price forecasting work, our future work is to develop ensemble forecasting as it is surged recent years and shows state of the art performance on many time series forecasting task [33]. Also, in order to verify its generalizability and robustness, the proposed model needs to be applied to predict other food commodities.

Data Availability

The dataset used in this paper was the price of London Sugar Futures from April 2010 to May 2020. Data are fetched from the investing.com.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

The project was supported by National Natural Science Foundation of China (nos. 51465003 and 61763001) and Innovation Project of Guangxi Graduate Education (nos. YCBZ2020012 and YCBZ2021019).