Abstract
The sintering process is a crucial thermochemical process in the blast furnace iron-making system. Tumble strength (TS), as a vital performance to assess sinter quality, is difficult to monitor due to the lack of timely measurement. Constructing a data-driven model for TS is an alternative for monitoring TS. However, the time-varying dynamic sintering process makes the task of modelling challenging. And the data are incomplete and insufficient in practice for modelling since there are unknown time delays in the system and lack actual TS value. The digital twin (DT) technique is a powerful tool to simulate the system dynamics with the real-time interaction between physical processes and virtual agents in cyberspace. This paper introduces a DT-enabled equivalent of the sintering system and proposes online data-driven modelling for TS monitoring. The time delay in the system is estimated for variable sequence alignment based on a modified maximum information coefficient method. The data used for modelling is enriched based on a multi-source information fusion technique. An adaptive update method is proposed to deal with the time-varying dynamics. The iterative forgetting factor-based algorithm is designed for the support vector regression method and guarantees a fast computational speed. Implementation and validation of the model on a DT-enabled sintering system show the efficiency of the proposed method. The accuracy of TS monitoring reaches 99.6% by analysis of 3 months’ data.
1. Introduction
Sintering is a process of forming various types of mining iron ore, coke, flux, and other raw materials through heat and pressure to iron-enriched sintered ore. The quality of sintered ore is essential for guaranteeing the quality of iron since sintered ore is the main source for the blast furnace iron-making system [1, 2]. Tumble strength (TS), as an important indicator of sintering quality, is vital for guaranteeing the smooth operation of the blast furnace [3, 4]. However, TS is difficult to measure online due to the lack of monitoring equipment [5, 6]. Monitoring TS by establishing corresponding models is challenging since sintering is a complex time-varying dynamic thermo-chemical process with unavoidable system delays. Data-driven modelling has received lots of attention in academia and industry due to its effectiveness in practical applications [7–10]. However, it is still difficult to select suitable input variables for modelling due to the complex dynamics of sintering. Recently, various neural networks methods, such as Elman, back-propagation, grey, and extreme learning machine (ELM), have been developed to predict the quality of sintered ore based on input variables selected by correlation analysis [11–15]. However, this input variable selection method may result in neglecting important variables due to system uncertainties. For instance, the approach described by Umadevi et al. [11] did not take into account factors such as air pressure and the height of the sintering material within the sintering bed. Considering these variables is essential for modelling heat transfer during the sintering process [9]. It then reduces the accuracy of modelling. To tackle this problem, a dynamical time-features-expanding method is proposed by reconstructing the time sequence of input variables [3]. Many learning-based methods have been developed to increase the model’s accuracy. Wang et al. [16] developed an integrated method of ELM and AdaBoost algorithm. The integrated methods increase the model accuracy, although the computing time of the model is long. Er et al. [17] proposed a fuzzy neural networks-based method and predicted the quality in two steps, i.e., offline learning and online prediction. The two-step procedure helps the model to cover a long-term prediction, which shows the potential for constructing the online model.
However, there are some issues to be considered for online TS modelling and updating according to system dynamics. First, the data are incomplete and insufficient in the sintering field. The time delay in the sintering process results in sequence mismatching in the data. The actual TS value is sampled every 8 hr which is insufficient for the data-driven model to learn the intrinsic relationship between variables. Second, the accuracy of the data-driven model will be degraded due to the time-varying process. It may even lead to online TS monitoring failure. To tackle these problems, we must answer the following two subquestions: (1) How to expand the data and extract suitable features in the data for data-driven modelling? (2) How to design an update mechanism for the data-driven model to adapt to the sintering process?
The digital twin (DT) technique is proposed to connect the physical process with virtual agents [18, 19]. DT can accurately reflect the state of the physical entities through the model built for the digital entities. DT can also update the model according to historical and current measurements for the physical entities. Therefore, it is beneficial to apply the DT technique for TS modelling and updating. Moreover, DT has a strong potential to improve productivity for the complex process industry since all kinds of models are applied to estimate and predict the system dynamics [20–22]. Recently, Zhou et al. [23] built a cloud platform-based application of the iron-making DT. Aheleroff et al. [24] proposed the DT as a Service (DTaaS) reference architecture and applied it to industrial cases. The previous research has focused on elucidating the architecture details, creating services, and outlining applications for the DT system. However, a comprehensive exploration of constructing a TS model remains endeavour [25].
In this paper, we introduce the DT platform for the sintering process for TS monitoring. An equivalent of the sintering system operates on the DT platform. A data-driven DT-enabled TS model is built with an update mechanism according to the time-varying sintering process. We introduce a multi-source information fusion method to expand the data and eliminate the sequence mismatching problem by delay estimation. The main contributions of this study are: (1) an online dynamic modelling method for the DT-enabled sintering system is proposed. It provides a comprehensive solution for online TS monitoring. The application of the DT sintering system shows that the proposed method achieves accurate TS monitoring with a 1 min prediction interval. (2) A multi-source information fusion technique is proposed to overcome the limitation of incomplete and insufficient data for modelling in the practical processes. (3) An iterative update method is proposed for model training that covers the time-varying sintering process in the long term. The algorithm also guarantees computational speed as it avoids frequent model retraining.
The paper is organised as follows: the DT-enabled TS model scheme and problem formulation are given in Section 2. The main methods, including delay estimation, multi-source information fusion, and the update mechanism, are given in Section 3. In Section 4, we conduct the implementation and application of the DT-enabled sintering system and the data-driven model in practice. Section 5 concludes the paper.
2. System Architecture and Problem Formulation
The architecture of the DT-enabled sintering process is shown in Figure 1. The data generated from the physical sintering system is sent to the virtual agent. The virtual agent stores different kinds of data for DT model construction. The results of the models can be used for monitoring, simulation, and process optimisation.

2.1. DT-Enabled TS Model
We proposed a DT-enabled data-driven model for online TS modelling in Figure 2. The model can be used to monitor the TS value online. The scheme contains a virtual agent and a physical process. The virtual agent and physical process form a closed loop.

In the physical process, sensors such as thermocouples, pressure sensors, and flow sensors sample the state variables of the sintering process. Moreover, we introduce an infra-red camera to capture the red layer (the ore whose temperature is above ) thickness of the sintered ore at the tail of the sintering bed. The details about expanding data will be shown in Section 3. The height of mixed materials , the air volume below the sintering bed , the air pressure below the sintering bed , and the weight of the raw material are used as the input variables for data-driven modelling according to [9, 26, 27]. The variables are listed in Table 1. The input variables are sent to the virtual agent uninterruptedly for TS data-driven modelling and update.
In the virtual agent, the data-driven model is designed to consider the non-uniformity of materials in the sintering bed by using multiple submodels, each representing a different area. The multiple submodels can improve model accuracy but also increase its computational complexity. DT technique with strong computing power and multi-threading technique can help to speed up computation time as the submodels do not depend on each other. The outputs of the submodels are the red layer thickness of sintered ore at each area at the tail of the sintering bed. The reason for introducing the red layer thickness as an intermediate output is the TS value is insufficient for model training. The relationship between the TS value and the red layer thickness is explored based on the experiments on the sinter pot tests. The details about the sinter pot tests will be given in Section 2. The TS value is calculated based on the fitting formula with the red layer thickness. The TS values can be used as control feedback for the physical system.
The submodel is trained with delay analysis. Specifically, the delay between input variables is eliminated by aligning the data to the same time sequence based on the results of delay analysis. The iterative forgetting factor-based SVR (iFFSVR) algorithm is used for training and updating the model in real-time based on new input data.
2.2. Problem Formulaton
A 3D coordinate system describes the sintering bed. In Figure 3, , , and represent the width, length, and height of the sintering bed, respectively. The raw material is added to the head of the sintering machine. The sintered ore is gradually produced when it is transported to the tail of the sintering machine. The TS of the sintered ore at the sintering bed tail in each area is inconsistent due to the non-uniformity of the mixed materials. The notion is defined as the TS density at the position at time , and the TS result is considered as an integral of TS density to the volume. The TS at time can be formulated with consideration of the system delay,where is the cooling time, representing the time interval between the hot sintered ore running at the tail of the sintering bed and the cooled sintered ore sampled for testing. The notion is the tail of the sintering bed and represents the length of the ore block. The notion is the height of the ore block at at time .

Since Equation (1) shows the sintering process is a 3D system with system delay, where and when to sample input variables for TS modelling need to be determined. According to the local thermal non-equilibrium theory by Zhang et al. [9], Loo et al. [26], Ye et al. [27], Nath and Mitra [28], and Alazmi and Vafai [29], TS is related to the variables , where are the composition of sintering raw materials. In contrast to correlation analysis techniques, the variable selection method based on the sintering mechanism can choose appropriate input variables, unaffected by the uncertainties of the system. These variables are also available in practice. For application, we assume that the height of sintered ore at the tail of the sintering bed is constant, and from to keeps invariant, which means the TS density varies along the -axis. We now use and to denote the TS density at the position at time and the height of sintered ore for clarity, respectively. In practice, the sintering raw materials are added at the head of the sintering bed at time . To analyse the non-uniformity along the -axis, we denote the added raw materials as . Simultaneously, we sample the height of mixed materials, the air volume, and the pressure bellow at the head of the sintering bed, which are denoted as , and , respectively. By modelling the relationship between input variables and TS value, we can determine the TS value as soon as the materials are added. Note that it needs time to sinter the raw materials into sintered ore. Thus, the TS density is with the materials added at .
Then, the TS can be given as follows:
For calculation, we discrete Equation (2) as follows:where represents each area along the -axis of the sintering bed, and represents the total number of areas.
Since the function is unknown, we use , where is a nonlinear function. In this way, we can use data-driven submodels to formulate to represent TS in each area, considering material non-uniformity in the sintering bed. Note that the function is still unknown. To learn the function based on insufficient and incomplete data in practice, we need to manage to enrich the data and eliminate the time delay in the data to obtain an accurate model. Moreover, an adaptive update mechanism needs to be designed for the function to reflect the time-varying sintering process.
3. Main Results
3.1. Delay Estimation and Multi-Source Information Fusion
In this subsection, we estimate the delay between two variables and use a multi-source information fusion technique to deal with modelling, with incomplete and insufficient data.
3.1.1. Delay Estimation
Since the system delay is in the process, the same time-tagged data may not correctly match the model sequence. Moreover, the delay is unknown. To construct a suitable variable set for accurate modelling, the delays between the samples need to be estimated and eliminated.
Suppose there are two time series of variables, and , represent observation sample sequence of and , respectively. A finite set of ordered pairs of and under time lag is . Given a grid , which partitions the -values of into bins and -values of into bins. According to Reshef et al. [30], the estimation of mutual information between two variables with a delay of is given as follows:where is the distribution induced by the points in on the cells of . The term denotes the mutual information (MI) of . .
Then, to identify the time delay between and , the sliding window method by Zhai et al. [31] is adopted. The time delay between and can be obtained as follows:
When is the time delay between and , the maximum information coefficient (MIC) of is the largest term among . The sign of provides the direction of the time delay between variables and . If is positive, the direction of the delay is from variable to , i.e., lags behind . When is negative, the direction is from variable to variable , i.e., lags behind .
Remark 1. In practice, the sampling frequencies of variables are sometimes inconsistent. For example, the sampling period of the height of mixed materials is 1 min, and that of TS is 8 hr. The difference in sampling frequencies makes it not possible to analyse the time delay directly by the proposed method since the data densities are not the same. Therefore, frequency matching of the two variables is required. To unify the frequency, the data with high frequency are re-sampled under the low frequency. In this way, the sample datasets of two variables are the same in size, and the time delay can be analysed based on the proposed method.
The sequence of variables is adjusted according to time delay , which is identified based on Equation (5). Define and as the adjusted variables for and , respectively. The sequences of variables are displaced to be stacked as follows:
3.1.2. Multi-Source Information Fusion
Since the actual value of TS cannot be obtained every minute for modelling, the multi-source information fusion technique is used to introduce an intermedia variable for modelling. Motivated by existing studies on the relationship between burning temperature field and TS [26], we use the infra-red image data of the sintering bed tail section as an intermedia variable. Then we use the sinter pot test results to connect TS value and the infra-red image data.
The infra-red images show the combustion pattern of the sintered ore at the sintering bed tail section. The TS is related to the thickness of the red layer, i.e., the ore whose temperature is above . Then, the sinter pot test results connect the sinter quality and the thickness of the red layer. Figure 4 shows the temperature distribution of materials on the sintering bed tail section, which is obtained from an infra-red image with a resolution of 640 480. The thickness of the red layer can be obtained from the infra-red image.

The sinter pot tests are conducted to obtain the formula, to regress the relationship between thickness and TS [27]. The sinter pot is a cylinder made of heat-resistant steel. The diameter of the sinter pot used in this paper is 300 mm, and the height is 1,000 mm. Five thermocouples were inserted into the sinter pot at the height of , and , respectively. The test procedure is the same as the practical production. In the test, the mixed materials are put into the sinter pot, and the igniter ignites, and burns the material thoroughly from top to bottom. After the combustion, the sintered ore is formed by cooling, crushing, screening, and other operations. The temperature in the sinter pot can be directly obtained through five thermocouples, and then the TS of the sintered ore is obtained by the manual detection. Figure 5(a) shows the temperature fluctuations of a sinter pot test and Figure 5(b) shows the temperature field changing in the sinter pot and the demonstration of the red layer. Based on sinter pot tests, the fitting formula is introduced into the data-driven model [27]:where represents the thickness of the red layer (), represents the TS (%), and are coefficients determined based on the sinter pot tests. The fitting formula enables us to use the thickness of the red layer as an intermediate variable to train the model, instead of using the insufficient TS value. Specifically, we can use input variables and intermediate variable to learn instead of learning . Thus, it has

(a)

(b)
3.2. Iterative Update Mechanism for TS Model
To learn the function , we assume , where is input variable set sampled at in the DT system, is the thickness of the red layer at in the DT system and is the bias. The time interval is 1 min. Function is a nonlinear function mapping from the original state space into a higher dimension. Then, support vector regression (SVR) is formulated according to Vapnik [32] as follows:
Parameter is the pre-defined threshold. According to Vapnik [32], Equations (9) and (10) can be transformed into quadratic programming, and for any input variable set , it haswhere is Lagrangian dual multipulier, and is Gaussian kernel function.
Different coefficient of each submodel represents the non-uniformity in the sintering process. Note that the sintering process is time-varying since the operation condition changes with time. Consequently, the output value changes. An initial training procedure cannot meet the long-term accurate prediction requirement. To avoid frequent offline retraining procedures, the iFFSVR algorithm is proposed for iterative changing the support vector coefficient by the forgetting factor to obtain a more accurate data-driven model timely. When Equation (11) is trained with initial data , the initial output is . The variable with the subscript is its iterative version at time . The following procedure updates the coefficient when the DT system sends an actual value .where the subscript represents time , , and . Note that the error is obtained to calculate the bias between the virtual agent and the physical process in the DT system. The term is the actual value at and is constructed based on input variables at . To keep the term being non-singular, it can be regularised by adding a small value diagonal matrix [33]. The term influences the estimation results because it is related to the new samples used in the data-driven model. Furthermore, to alleviate the influence of the old sample while increasing the variation caused by the new data, a forgetting factor is added to the .where
The forgetting factor is updated according to the new samples matrix and the model output error as follows:where , , , and . The term is a fixed number. Initial values and are between 0 and 1.
When the operation changes drastically, the accuracy of the data-driven model decreases [31]. In case of the iterative update no longer guarantees the model’s accuracy, a retraining strategy needs to be designed. The average error triggers the retrain to keep the update procedure efficient. Specifically, the average error is defined as follows:
The threshold in Equations (9) and (10) triggers the update procedure. When , the model is retrained with a new dataset. The new dataset is selected by substituting the samples with small Lagrangian multipliers. Algorithm 1 gives the pseudocode of the proposed adaptive update method.
The proposed method does not need a frequent retraining procedure because of its iterative design. This guarantees the stability of the DT-enabled system when the model is updated with the physical process. The number of training samples of the method is always the same in this study, and the size of the matrix of the model keeps unchanged, which helps to keep a low-computational cost.
|
4. Implenmentation and Verification
To realise the practical application of online TS monitoring, we designed the sintering DT-enabled system and implement the proposed model in the system.
4.1. DT-Enabled Platform for Sintering
The implementation of the DT-enabled system relies on the numerous models, which further form DT-enabled service components and provide services for the DT-enabled system through model integration and data interaction. The operation and management of the models depend on a unified platform called the DT-enabled platform. The relationship between the modules in the DT-enabled platform is illustrated in Figure 6. Since the models are constructed independently from each other and provide at least one service for the DT-enabled system, the model construction method coincides with the idea of micro-services. Micro-services are different from monolithic applications. The latter is to develop and deploy all the application functions together. Micro-services divide an application into several small service modules. Each service is deployed independently and runs independently in its process [34]. The services communicate, coordinate, and cooperate through lightweight communication protocols [35]. Therefore, it is reasonable to build the DT-enabled platform for sintering based on the micro-service architecture. We use harbour to store Docker images and Rancher to implement the Kubernetes function and manage images to construct a DT-enabled platform.

4.2. Application of TS Monitoring
To verify the proposed TS data-driven model, the DT-enabled platform is established with a TS monitoring service for a 360 m2 sintering bed in Guangxi, China. Verification is applied to the TS model to evaluate its performance in the TS monitor. To illustrate the results and comparisons, four services are constructed for visualisation in the front-end website. Specifically, the online TS monitoring micro-service is the core module of the quality monitoring service. The online TS monitoring micro-service uses the latest real-time data to monitor the TS and writes the monitoring value to the database. At the same time, the query service queries the latest monitoring values in nearly 2 hr from the database and displays the monitoring information to users in time. Furthermore, to show the monitoring accuracy to users, when the actual value is updated, the monitoring accuracy micro-service calculates the relative error between the monitoring value and the actual value. The monitoring value and actual value of the past week are displayed to the user for comparison through the monitoring result comparison micro-service.
4.2.1. Result in TS Monitoring
The data-driven model is established and used to monitor long-term TS value in this part. A total of 1,305,600 samples in 38,400 sample sets from approximately 27 days of data are used to train, update, and test the model. It should be noted that the actual TS value is sampled three times a day, and we select 80 samples of the actual TS value to show the monitoring error of the data-driven model. Table 1 shows the variables used for the data-driven model. The input and intermediate data are used for training and testing the model. The initial data-driven model uses the first 300 sets of samples for training, and a fivefold cross-validation grid searches the parameters. Then, we set the parameter in Equation (8), , , , in iFFSVR, in the update procedure, and in the fitting formula. Finally, the output data is used to verify the performance of the proposed scheme. Figure 7 shows the long-term monitoring of TS value based on the proposed data-driven model. Comparisons are conducted based on the methods proposed by Umadevi et al. [11], Wang et al. [16], and Ye et al. [27] using our practical data. Figure 8 shows the relative error of the proposed method. Regarding model monitoring accuracy, of the monitoring results based on the proposed method are within of the relative error, and the maximum relative error is only , which satisfies the requirement of the practical application. Other methods do not perform well since they overlooked the time delay in the process and the time-varying nature.


Statistical results, namely root-mean-square error (RMSE), mean relative error (MRE), normalised mean square error (NMSE), and Pearson correlation coefficient (R), were used to evaluate the accuracy of the model.where , , , and are the actual value, the predicted value, the average actual value, and the average of predicted value, respectively. The term is the size of the sample set.
Table 2 compares the statistical results of the proposed data-driven model to those of other methods found in the literature [11, 16, 27]. The table shows the number of update times and the longest update computing time of the proposed method. The results show that the RMSE, MRE, and NMSE of the monitoring based on the proposed method are smaller than those based on the methods in the literature [11, 16, 27]. Note that the data in the table are directly taken from the literature to show their best statistical results. The results in Figure 7 are given from experiments with our dataset, whose statistical results are not performing well compared with those given in their papers. The statistical results also show that the correlation coefficient (R) of the monitoring based on the proposed method is larger compared to the other methods. This means that the proposed data-driven model is more accurate and effective. The update frequency of the proposed method is , which indicates that the model updates occasionally. The average update interval is around 30 min. The longest computing time of the update is 23.58 s, which is shorter than the sampling interval (i.e., 1 min). This indicates that the proposed data-driven model can provide accurate and timely monitoring results that meet the actual industry operation requirements.
Moreover, we conduct comparisons of computational speeds to demonstrate the efficacy of the proposed method. Specifically, we utilise the approaches described by Umadevi et al. [11], Wang et al. [16], and Ye et al. [27] incorporating retraining procedures for updating the data-driven model. The statistical outcomes of the computing time for these updates are presented in Figure 9, with each method undergoing 500 update procedures. Our method exhibits the shortest average computing time for updates, primarily due to the iterative mechanism of calculating most updates. Notably, two outliers correspond to the retraining procedures’ computing times. Compared to a similar data-driven model (SVR) as illustrated by Ye et al. [27], the proposed iterative update mechanism significantly enhances computing speed by approximately a factor of 10.

4.2.2. Result in TS Monitoring Workload
The online monitoring micro-service is established based on the proposed data-driven model. The online monitoring micro-service consumes an immense workload on CPU, network, and memory among the constructed micro-services since it continuously monitors the TS results and automatically updates the model. Figure 10 shows the workload of monitoring micro-service in 99 days. The network I/O, memory, and CPU workloads of the online TS monitoring micro-service keep stable for a long time in the DT-enabled system, which shows that the proposed data-driven model is efficient. The maximum relative standard variance of the workload is 7.82%, which shows the stability of the proposed method. The stability of the operation is critical for industry operation and the DT-enabled system. The breakdown of a micro-service in the DT-enabled platform could lead to system-level failure since the micro-service is essential and provides basic data for other services.

Figure 11 is the front-end website of the TS monitoring service on site. We showed the north side, the south side, and the mean monitor value on the screen. The monitoring results, accuracy, and comparisons are shown on the right side of the front-end website, which is powered by micro-services.

5. Conclusion
This paper developed an online data-driven TS model for the DT-enabled sintering system. We introduced a system-delay-estimation-based variable modification and an infra-red image-based data enrichment procedure to deal with incomplete and insufficient data for constructing a data-driven TS model. The modelling includes a multi-submodel scheme, TS model mechanism analysis, and iFFSVR algorithm with an update strategy to deal with non-uniformity material and time-varying nature in the sintering process. A concrete systematic solution for implementing and applying TS value monitoring in the DT-enabled platform is given to illustrate the detailed development procedure. A 3month operation for a sintering bed in Guangxi, China, is performed to show the efficiency of the data-driven TS model.
Despite the advantages of the proposed method demonstrated above, it also requires some future research. The TS value is given based on the fitting formula. In general, the sintering material used for sintered ore production is stable for the long term in practice. The fitting formula can cover the long-term operation condition. When the material changes drastically, it requires an update mechanism for the formula.
Data Availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.