Abstract
Solar energy is widely adopted today and produced by photovoltaic or concentrator solar power (CSP). Photovoltaic technology is the most prevalent, thanks to its well-established technology and low costs. CSP technology, on the other hand, has received less attention and interest, as it requires larger investments and a considerable surface. A relevant difficulty connected to the CSP is decoupling solar randomness and energy production. This paper proposes an artificial neural network (ANN) which foresees energy production using a solar parabolic dish installed at Politecnico di Torino (Energy Center Lab). The investigation was performed using a backpropagation ANN. Different learning algorithms were used: Levenberg-Marquardt, Bayesian regularization, resilient backpropagation, and scaled conjugate gradient. Seven atmospheric condition parameters were adopted (humidity, temperature, pressure, wind velocity and direction, solar radiation, and rain), to calculate the receiver temperature as an output. Bayesian regularization was found to be the optimal model for CSP energy production. The results of this investigation suggest that the ANNs are a strong, reliable, and useful tool for predicting temperature in a CSP receiver that can be of great value in the forecasting of energy production. The outcome of this investigation can simplify energy production forecasting using readily available meteorological data.
1. Introduction
Today, energy production derives principally from fossil fuels (i.e., oil, natural gas, and coal) and nuclear power. Despite their wide use, they involve major environmental, economic and social issues. Fossil fuels are responsible for the majority of greenhouse gas (GHG) emissions, such as CO2 emissions, determining the increase of the global mean temperature, i.e., the global warming phenomenon [1]. The development of renewable energy is therefore crucial in this process. As the physicist Cesare Marchetti points out, “all historical energy transitions occur with the parallel improvement and diffusion of technological innovations”. Furthermore, Marchetti notes that “the introduction of new primary energies requires 10 to 20 years of observation before understanding the long-term market behaviour” [2]. This energy transition from nonrenewable energy forms to renewable energy technologies is a priority of the European Union (EU) [1]. EU policies are oriented toward renewable energy valorization, promoting environmental sustainability through large investments, including their diffusion and the research of new and efficient technologies. For this purpose, the European Green Deal [3] is aimed at making the continent climate-neutral by 2050.
Among renewable energies, solar energy is divided into Photovoltaic and Concentrator Solar Power (CSP). Photovoltaic solar power is the most prevalent, thanks to a well-established technology and a drastic reduction of costs allowing mass production. CSP, however, requires greater funding and research and needs a considerable amount of surface for its deployment. Nonetheless, the current progress of research into Thermal Energy Storage (TES) systems is a promising development for the improvement of CSP energy production. These storage systems allow the decoupling of solar randomness and intermittency, facilitating the continuous energy production. In CSP plants, energy is generated indirectly by concentrating solar radiation. These plants are built using a variety of components including heliostats, receivers, TES, and a source of energy generation, typically turbines [4]. The power generation process consists of concentrating sunlight onto a receiver that carries a heat transfer medium. This medium is heated to a high temperature and later which later passes into a steam turbine. The most common types of CSP are (i) solar parabolic dish (SPD), (i) parabolic trough collectors (PTC), (iii) solar power tower (SPT), and (iv) linear Fresnel reflectors (LFR).
The adoption of TES (Figure 1) in the CSP plant plays a key role in addressing the duck curve [5, 6]. The final configuration is determined by the best trade-off between cost and energy production; at the moment, the main costs of the plants come from heliostats and the manufacturing complexity. The challenge to keep low costs and maximize efficiency is tackled with studies in the chemical and engineering field. The latter field of study can be investigated by focusing on the forecasting of thermal and energy production into the receiver.

Artificial Neural Networks [7], developed from human brain patterns, have been successfully applied in many different fields such as medicine, industry, stock markets, biology, and electronic systems. During the last few decades, the increased use of sensors and tools able to gather information (data) from the environment has demonstrated the great advantages and importance of ANNs. ANNs are data-driven, self-adaptive methods that do not require any prior assumptions. The first critical point needed for the correct configuration of an ANN is the determination of the number of neurons. Input and output parameters are given during problem formulation; in contrast, hidden neurons are more complex. A single hidden layer is sufficient to deal with nonlinear functions, while the use of multiple layers can provide greater precision. There is no unambiguous methodology, especially since each problem has its own set of attributes and correlations. In the literature, it is common to find the same problems with different models because of the high variability and the hidden nonlinear relationships between parameters. The second critical point is the activation (transfer) function, which determines the relationship between two adjacent layers and is intended to introduce the degree of nonlinearity [8]. To ensure stability, these functions are bounded, monotonically increasing, and differentiable. The third critical point is the training algorithm. ANNs can also be defined as an unconstrained nonlinear minimization problem whose objective function is, generally, “mean square error.” For this purpose, the weights and biases are iteratively modified to minimize the global error. However, it is difficult to find the global minimum that optimises the objective function for nonconvex problems. The fourth critical point is the normalisation of the data, whose normalisation range depends largely on the activation function. Finally, the last critical point is performance measurement, the most important parameter of which is prediction accuracy. Therefore, this study makes a major contribution to research on CSP by demonstrating the ANNs used for temperature forecasting using readily available meteorological data.
2. Material and Methods
2.1. Solar Parabolic Dish
The solar parabolic dish, in Figure 2, is composed of an aluminium paraboloid that is completely coated by a polymeric film, characterized by a high reflection efficiency; El.Ma. Srl (Riva del Garda, TN, Italy). The plant comprises an automatic solar tracking system with two independent axes to control the azimuth and the elevation. The dish changes its orientation in real-time, and it functions by calculating the hours, the date, latitude, and longitude, with maximum theoretical accuracy (>0.015°). It is possible to steer two different dishes by controlling one axis while the other is kept fixed. All the features are reported in Table 1.

2.2. Temperature Monitoring
A B-type thermocouple (Tersid Srl, Milan, Italy) is placed in the focal point on the receiver. Around the receiver and inside the receiver, other three -type thermocouples (Tersid Srl, Milan, Italy) are inserted. The receiver is an Alumina (Al2O3) tube (Almath Crucibles Ltd., Newmarket, UK). Data are sampled every minute on different days from December 2019 to March 2020.
2.3. Weather Station
The station consists of instruments for the measurement of humidity (hygrometer) (%), temperature (thermometer) (°C), atmospheric pressure (barometer) (mbar), wind speed (anemometer) (m/s), wind direction (°), global solar radiation (pyranometer) (W/m2), and rain (rain gauge) rain (mm) (STR-21G, EKO Instruments Ltd., Den Haag, NL). For reasons of accuracy, temperature and humidity measurements are kept away from direct solar radiation. A personal weather station includes a digital console that provides Excel readings of the collected data acquired every 15 minutes. Input variables were recorded, such as relative humidity , temperature °C, atmospheric pressure , wind velocity , wind direction °, and global radiation . The equipment produced 22166 samples for each parameter during 2019-2020.
2.4. ANNs Configuration
Due to the large number of global variables and hidden relationships, the literature cannot provide any a priori hypothesis as to which model, architecture, or algorithm is best suited for the study. The investigation was conducted through the back propagation artificial neural network (BPANN) by applying four different learning algorithms: Levenberg-Marquardt (LM), Bayesian regularization (BR), resilient back propagation (RPROP), and scaled conjugate gradient (SCG). The net was fed with seven atmospheric parameters (humidity, temperature, pressure, wind velocity, wind direction, global radiation, and rain) detected by a weather station and produced as output the concentrator temperature detected by a B-type thermocouple (Tersid Srl., Italy), Figure 3. The thermocouple recorded temperature in the range °C.

To determine the best model, different architectures were analysed, with one and two hidden layers. In the first case, the hidden neurons ranged from one to twenty. See Figure 4(a). In the second case, for every hidden neuron in the first layer, there were three times as many in the second hidden layer; see Figure 4(b).

The ANN network was built using MATLAB 2020b with the Deep Learning Toolbox. The evaluation criteria, which were used to rank the prediction accuracy and goodness of the models, were root mean square error (RMSE), mean absolute error (MAE), mean absolute percentage error (MAPE), mean bias error (MBE), correlation coefficient (), -squared (), processing time, and epochs.
2.4.1. Transfer Function and Normalization Interval
For this work, the Tan-Sigmoid activation function was used. It was chosen because it is mathematically equivalent to tanh () but works faster [10]. The function is described in Figure 5.

The normalization of the data in this work was optimized for this function by normalizing it in the range .
2.4.2. Levenberg-Marquardt
LM is an evolution of the Gauss-Newton algorithm. To make sure that the approximated Hessian matrix is invertible a factor is added:
By exploiting the direction of the gradient and recalculating the approximate performance index, if a smaller value is obtained, then is divided by a factor . Conversely, if the value is not reduced, then is multiplied by . When the combination coefficient is very small (nearly zero), LM is approaching to the Gauss-Newton algorithm. When the combination coefficient is very large, it can be interpreted as the learning coefficient in the steepest descent method: [11, 12].
To develop this work, the data were divided into three sets: training ( samples), validation ( samples), and testing (1 samples). For the one-layer configuration, the activation functions used were the hyperbolic tangent from the inputs to the hidden layer and the pure line from the hidden layer to the outputs. In the configuration with two hidden layers, the sequence used was a hyperbolic-hyperbolic-pure line.
2.4.3. Bayesian Regularization
ANNs are a powerful tool for modelling nonlinear functions, but they can suffer from overfitting or overtraining. It is clear that once a model loses its predictability, it runs into validation and optimization problems. The main advantages of the Bayesian Regularization algorithm are the robustness and the unnecessary validation process [13, 14]. Furthermore, their models are difficult to be overtrained due to an objective criterion that disables training. Overfitting is solved by calculating and training on the actual number of parameters which are lower than the number of weights. Essentially, Bayesian Regularization incorporates Occam’s razor as it discriminates against complex models. The more complex the system, the faster the number of parameters converges to a constant. The algorithm is based on Bayes’ theorem, called the “inverse probability law,” and the Gauss-Newton approximation to the Hessian matrix leads to the Levenberg-Marquardt algorithm. The most probable step size is where is the effective number of parameters and is “the error of the weights.”
To develop this work, due to its capacity of nonvalidation, the data were divided into samples for training and for testing. For the one-layer configuration, the activation function used was the hyperbolic-pure line, while for the second configuration, it was hyperbolic-hyperbolic-pure line.
2.4.4. Resilient Propagation
In the RPROP algorithm (Figure 6), updates are made only from the signs of partial derivatives, without considering the magnitude. The regulation is made by changing the weights in opposition to the direction of the derivative until a local minimum is found. The dependence on the sign and not on the magnitude of the derivative allows the network to grow and learn equally [15].

To develop this work, the data were divided into 70% for training, 15% for testing, and 15% for validating. For the one-layer configuration, the activation function used was the sigmoid-pure line, while for the second configuration, it was sigmoid-sigmoid-pure line.
2.4.5. Scaled Conjugate Gradient
The idea of SCG is to combine the Levenberg-Marquardt algorithm, the classical conjugate gradient approach, and the second-order estimation used by Hestenes who introduced a nonsymmetric approximation by adding [16, 17].
The objective is to find a set of weight vectors for which H becomes positive, defined very close to zero. To develop this work, the data were divided into 70% for training, 15% for testing, and 15% for validating. For the one-layer configuration, the activation function used was the hyperbolic-pure line, while for the second configuration, it was hyperbolic-hyperbolic-pure line.
2.4.6. Taylor Diagram
All analyses were supported by Taylor diagrams which enhance the architecture evolution trends through graphs. This diagram is used to graphically identify prediction models by considering the RMSE and by quantifying the degree of similarity using a reference model (Figure 7). The diagram is composed of two orthogonal axes, which express the standard deviation and the correlation coefficient expressed in azimuthal position [18].

In this work, both the number of inputs and outputs have been modified to ensure greater loop variability. In particular, the unmodified, “Taylor diagram code” suffers from a finite number of alphabet characters, altough the author incorporated the ability to differentiate between upper and lowercase characters into the code. To solve this problem, a fourth input was added that stores the index of the “four loop,” and thus the number of neurons in each loop as a character. The coupling of “for loops” and “taylordiag” makes it easier to identify the respective pattern and ensures greater reliability. To increase the usefulness, three new output colours have been implemented, which will become part of the new input subset of the Taylor diagram.
3. Results
3.1. Levenberg-Marquardt
The performance of BPANN with a hidden layer, trained by Levenberg-Marquardt, is described in Figure 8. The Taylor diagram shows that the increase in the number of neurons in the hidden layer is positive for the prediction model. The best result was obtained with the 7-19-1-1 architecture. Due to its higher complexity, the 17th architecture could be preferred. Despite the good RMSE performance on the 16th, 18th, and 19th, they can be discarded due to the MBE. The performance of all models is shown in Table 2.

The addition of a hidden layer further improves performance compared to the previous case. The results are shown in Figure 9. This new configuration shows “constant” improvement for all models. From 1 to 10, this is evident, and the step is great. For models 11 to 20, the scaling up is more difficult, and the trend indicates that it is converging. Configurations 7-18-54-1-1, 7-19-57-1-1, and 7-20-60-1-1 are very promising in terms of forecasting capability (Figure 9). As previously reported, the 7-18-54-1-1 architecture is considered the best prediction model among the Levenberg-Marquardt models due to its complexity; see Table 3 (See Supplementary materials – SM Table 1, SM Table 2). Similar results were obtained by Yaïci et al. It was studied the predictive performance of the solar energy system using an ANN with 20 hidden neurons [19]. At the same time, Mohd-Safar et al. identified LM as the best learning algorithm for weather forecasting in tropical climates [20]. The goodness of these models is reinforced by the study of Farkas and Géczy-Víg, who modelled an ANN on flat-plate solar collectors. In this case, the LM algorithm proved to obtain the most accurate and valid results [21].

3.2. Bayesian Regularization
Bayesian regularization with a hidden layer behaves like LM analysis. The increase in neurons favours the increase in performance which is depicted in Figure 10. From the Taylor diagram, it is derived that 7-20-1-1 has the best performance, but due to the overestimation, 7-16-1-1 may be preferred. All error performance is shown in Table 4.

Increasing the number of layers improved the prediction capability with this learning method; the cluster of architectures in the Taylor diagram becomes longer and tends towards the observed model (Figure 11). The best-performing architecture among BR 2 HL is 7-19-57-1-1; see Table 5 (See Supplementary materials – SM Table 3, SM Table 4). The goodness of the BR algorithm was demonstrated by Khosravi et al. in their study on wind characteristics in Iran. Among the ANN learning algorithms, the results of BR are the best in terms of RMSE and R, followed by LM, RPROP, and SCG [22]. Yacef et al. studied the prediction of daily global solar irradiation by comparing BR and LM. The former approach led to an increase in accuracy, with a decrease in RMSE and MBE [23]. Similar results were obtained from the study by Alomari et al. on energy production in photovoltaic systems. The BR algorithm, with 27 hidden neurons, was the best model that produced the lowest RMSE compared to LM M [24].

3.3. Resilient Propagation
The performance of the BPANN with only one hidden layer, trained with resilient propagation is shown in Figure 12 and Table 6. Compared to previous models, this algorithm does not improve “smoothly” with the number of neurons, but the Taylor diagrams show a high model density. Configuration 7-16-1-1 is the best-performing.

The BPANN behaviours of two hidden layers, trained by resilient propagation, are highlighted in Figure 13 and described in Table 7. All the architectures exhibit poorer characteristics than previous multilayer hidden architectures; see Supplementary materials – SM Table 5, SM Table 6.

3.4. Scaled Conjugate Gradient
The performance of the BPANN tests with a single hidden layer, with scaled conjugate gradient, is shown in Figure 14 and Table 8. The performance of SCG is far from the observed model, and the best architecture is represented by 7-8-1-1.

Multilayer architectures are characterized by many “misbehaviors.” In fact, 8, 11, 16, and 18 architectures have the worst performances among all; see Figure 15. Furthermore, architectures 2 and 5 are scaled compared to their relative single-hidden-layer configurations. The best model, in terms of performance, is represented by 7-19-57-1-1; see Table 9 (see Supplementary materials – SM Table 7, SM Table 8).

3.5. Time and Epochs Analyses
These results highlight the better overall performance of BR 2 HL compared to the other learning algorithms, with 7-19-57-1-1 being the best of its architectures. However, these analyses are not weighted in terms of the time and epochs required to achieve the goal. Since single-layer architectures perform poorly, time and epoch analyses were not considered. Figure 16 shows the time, in seconds, of all architectures. In the case of BR 2 HL, the almost exponential dependence between the number of neurons and the time required for processing is evident; this analysis ends with 1500 seconds for the 7-20-60-1-1 architecture. In contrast, for LM 2 HL, the time required for processing was very low for the range of architectures from 7-1-3-1-1 to 7-16-48-1-1, the maximum being 84.544 seconds. Starting from the seventeenth architecture, there was an increase in the resources required to complete the analysis, and the time rose to 277.037 seconds. In terms of time, SCG is the best.

Figure 17 shows the epochs of the multilayered models. There are some concerns regarding BR and, similarly, RPROP. Only in six cases out of twenty did the algorithm not stop due to the epoch constraint, which is set to 1000 by default. The fact that these six cases are in the first ten architectures means that in all subsequent architectures the model could be even more accurate; increasing the number of epochs can, and could, increase the time required for processing. Since the latest models performed very well and the rate of improvement is inversely proportional to complexity, the study was not extended. While SCG stopped only once, LM never stopped due to epoch constraints for any architecture. The result showed that architectures with two hidden layers can outperform their respective layered architectures, and among the algorithms, LM and BR are the most promising. The same result was emphasised by Khosravi et al. [22]. Similar results were found by Mohd-Safar et al., whose studies showed that LM MAE and RMSE are the lowest and R the largest; SCG has the fastest time but did not produce good convergence; despite this, BR took the longest time [20]. Mohanraj et al. analysed the performance of a solar heat pump with three different variants of the learning algorithm (LM, SCG, and conjugate Pola-Ribiere gradient (CGP)). The results showed the speed and accuracy of LM with 10 hidden neurons, whose is the maximum and RMS is the minimum [25].

Despite the training speed of LM 7-18-1-1, the best model to consider is BR 7-19-57-1-1 due to its accuracy; see Table 10. Figure 18 shows the regression, and Figure 19 shows the prediction capability of the architecture.


4. Conclusion
This paper is aimed at providing an accurate artificial neural network model for power generation in terms of temperature prediction in a solar disc concentrator. Due to the large number of global variables and hidden relationships, the literature is unable to provide an a priori hypothesis on which model, architecture, or algorithm is best suited for the study. A back-propagation neural network was used with different learning methods such as Levenberg-Marquardt, Bayesian regularisation, residual backpropagation, and scaled conjugate gradient. To obtain a more conclusive result, different forms of architecture were considered, from one to two hidden layers and one to twenty hidden neurons. The results showed that increasing these two numbers can improve the overall accuracy of all architectures. This behaviour is reflected in Taylor diagrams, where single-layer architectures are very close to each other and far from the observed pattern. In contrast, bilayer architectures extend further and can get closer to the target. The proposed ANNs were trained with seven meteorological parameters: humidity, air temperature, pressure, wind speed and direction, global radiation and precipitation; all these meteorological parameters were taken from the Energy Center in Turin, Italy. The analysis of the most promising architecture was conducted by observing established protocols and proper procedures in the training, validation, and testing phases and normalising the data. To ensure the transparency of the prediction framework, RMSE, MAE, MBE, MAPE, and errors, algorithm stopping criteria, time, and epochs were taken into account. The two most promising architectures are, respectively, BR 2 HL, 7-19-57-1-1, and LM 2 HL, 7-18-54-1-1. Despite the large amount of time needed for training, BR 2 HL has been demonstrated to be the most accurate model. The testing phase is characterized by , , , , , and . The overall results have suggested that artificial neural networks are a strong, reliable, and important tool for prediction, and they may help researchers forecast trends in their studies in many different fields. For the study of the solar dish concentrator located at the Energy Center, BR 2 HL using the 7-19-57-1-1 architecture is the best prediction model that may be used for conducting a limited number of experiments, under specific input conditions. The resulting equation to estimate the temperature at the focal point is the following: where , , , , , and . Their values can be found in Tables 11–14.
Acronyms
ANN: | Artificial neural network |
BPANN: | Back propagation neural network |
BR: | Bayesian regularization |
CSP: | Concentrator solar power |
EU: | European union |
GHG: | Greenhouse gas |
HL: | Hidden layer |
LFR: | Linear fresnel reflectors |
LM: | Levenberg-Marquardt |
MAE: | Mean absolute error |
MAPE: | Mean absolute percentage error |
MBE: | Mean bias error |
PTC: | Parabolic trough collectors |
PV: | Photovoltaic |
R: | Correlation coefficient |
R2: | -Squared |
RMSE: | Root mean square errors |
RPROP: | Resilient propagation |
SCG: | Scaled conjugate gradient |
SE: | Solar energy |
SPD: | Solar parabolic dish |
SPT: | Solar parabolic tower. |
: | 1st derivative global error function |
: | Combination coefficient |
: | Conjugate system/search direction |
: | Dish diameter |
: | Dish projected surface area |
: | Effective number of parameters |
: | Error of the weights |
: | Factor |
: | Focal length |
: | Hessian matrix |
: | Identity matrix |
: | Maximum solar disc angle |
: | Minimum point of the posterior density |
: | Objective function |
: | Rim angle |
: | Scaling factor |
: | Solar irradiance |
: | Square function |
: | Step size |
: | Surface slope error |
: | Weight. |
Data Availability
Data is available on request, by contacting the corresponding author and the first author.
Disclosure
A thesis has previously been published from the same authors of this work. Ricci L., Prevision model for energy production in solar concentrator using Artificial Neural Network, DAUIN, Politecnico di Torino, 2020-21.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
Information on funds received, the prototype test bench was partially financed by the regional INFRA-P call of the Piedmont region.
Supplementary Materials
This article is enriched with supplementary material. The file contains a total of eight tables with all the performance of each training algorithm used in this article. In detail, it contains all RMSE, MAE, MBE, MAPE, , and for all the three subsets training, validation, and testing coupled with all the different architectures single/double HL, one to twenty hidden neurons. The tables are listed as follows: SM Table 1: Levenberg-Marquardt performance with one hidden layer, 1st to 20th architecture, subsets training, validation, and testing. SM Table 2: Levenberg-Marquardt performance with two hidden layers, 1st to 20th architecture, subsets training, validation, and testing. SM Table 3: Bayesian Regularization performance with one hidden layer, 1st to 20th architecture, subsets training, validation, and testing. SM Table 4: Bayesian regularization performance with two hidden layers, 1st to 20th architecture, subsets training, validation, and testing. SM Table 5: resilient propagation performance with one hidden layer, 1st to 20th architecture, subsets training, validation, and testing. SM Table 6: resilient propagation performance with two hidden layers, 1st to 20th architecture, subsets training, validation, and testing. SM Table 7: scaled conjugate gradient performance with one hidden layer, 1st to 20th architecture, subsets training, validation, and testing. SM Table 8: scaled conjugate gradient performance with two hidden layers, 1st to 20th architecture, subsets training, validation, and testing. (Supplementary Materials)