Abstract
Owing to the randomness of the job release time, it is not possible to obtain all job information in real time during the operation of a manufacturing system. Generating a suitable scheduling strategy at the correct moment is the focus of addressing this disturbance. In this study, the flow time of the job in the manufacturing system was used as a criterion for evaluating the performance of the scheduling strategy. Subsequently, a model was constructed for selecting the optimal dispatching rule (DR) to actively change the scheduling strategies during the production process. The constructed model for selecting the optimal DR included an initial model for selecting the optimal DR and an evaluation model. The initial model for selecting the optimal DR outputs a DR with better performance based on the attributes of the job to be scheduled in the manufacturing system. Meanwhile, the evaluation model is responsible for evaluating the DR output of the initial model for selecting the optimal DR and determining whether the DR needs to be updated; the update process is realized based on simulation technology. Following experimental verification, the constructed model could generate scheduling strategies with superior performance in real-time and realize the update of the historical database. The results of this study will be of reference significance for solving the disturbance problem encountered by the manufacturing system in real time.
1. Introduction
Manufacturing companies are subject to high requirements for production efficiency. Accordingly, high response speed requirements in the modern market environment are driven by the deep integration of a new generation of information technology and manufacturing technology. The manufacturing technology is derived from a new generation of intelligent manufacturing modes characterized by digitization, networking, and intelligence [1], and these modes have injected new impetus into the development of the manufacturing industry. More specifically, digital, networked, and intelligent workshop constructions have become a core theme of the era of intelligent manufacturing. With the popularization of CNC equipment at the workshop level, industrialized systems, such as enterprise resource planning systems (ERPs), manufacturing execution systems (MESs), and product data management systems (PDMs), are widely implemented, which includes the universal application of network technology, such as data storage and connection workshop production factors. Digital and networked workshop construction systems have been developed. Simultaneously, it laid a solid foundation based on data and a multifactor collaborative decision-making foundation for developing an intelligent workshop. Moreover, the exponential growth in the amount of data at the workshop level has led to low knowledge density in the data and the lack of real-time learning and adaptability of the intelligent models adopted for decision-making and auxiliary decision-making, which have become critical factors hindering the construction of intelligent workshops. In this study, the production scheduling module, an integral component of the manufacturing system, is considered as the research object. The aim of the study is to determine the disturbance factor of the randomness released by jobs in the manufacturing system. Accordingly, a model for selecting optimal dispatching rules that actively generate scheduling strategies in real time is constructed. The scheduling of new jobs is achieved by updating the scheduling strategy to minimize the average flow time of jobs in the manufacturing system. Moreover, during operation, the model monitors the job attribute data that lead to the degradation of the model’s performance, accomplishes the targeted collection of scheduling data, and reduces data redundancy.
Scheduling is a decision-making process in which resources and jobs are allocated efficiently under predetermined scheduling goals to meet production constraints [2]. Over the past 60 years, there has been increasing attention on the job shop scheduling problem because of more complex manufacturing systems and personalized market needs. In fact, generating scheduling strategies with high real time and adaptability has become an essential requirement of the current manufacturing system faced with short periods in multivariety and small-batch production methods of job shops and frequent disturbances inside and outside the workshop. When a disturbance is easily predicted, the scheduling strategy can be updated in advance to avert the adverse effect of the disturbance on the normal operation of the manufacturing system. In contrast, when it is difficult to predict a disturbance, the scheduling strategy cannot be updated in advance, or it becomes impossible for the updated scheduling strategy to adapt to the state of the manufacturing system under the influence of the disturbance factor, which will have adverse or catastrophic effects on the normal operation of the manufacturing system. In general, because the jobs’ information (job release time, job processing attributes, etc.) cannot be obtained and accurately predicted before the jobs are released to the production system, the disturbance of job release randomness is unpredictable. Therefore, it is challenging to formulate a robust scheduling strategy at the beginning of the scheduling, and the improper scheduling strategy may adversely affect the overall production efficiency of the manufacturing system.
Researchers have proposed a numerous methods for overcoming the scheduling problems encountered, which primarily include methods for solving scheduling strategies with precise algorithms, such as the simplex algorithm and branch-and-bound algorithm, methods of solving scheduling strategies using meta-heuristic algorithms [3, 4], such as evolutionary algorithms, particle swarm algorithms, and ant colony algorithm as well as methods for solving dispatching strategies using artificial intelligence (AI) algorithms, such as neural networks and machine learning (ML). The purpose of this precise method is to obtain an optimal solution. It has high requirements for computing power and is difficult to adapt to high-complexity production scheduling problems. The precise method is primarily suitable for solving low-complexity static scheduling problems. The general steps of the heuristic method to solve the scheduling problem involve establishing a mathematical model, designing an algorithm, programming realization, and verifying through simulation. In the related literature, solving the scheduling problem using the heuristic method was primarily divided into the following two forms: First, a goal-constrained mathematical model is constructed to describe the scheduling problem, and then a highly adaptive algorithm is designed to obtain a high-quality feasible solution to the scheduling problem [5]. The second involves designing a heuristic algorithm library before using simulation technology to select the optimal algorithm from the algorithm library to acquire the optimal scheduling strategy through a global search for different manufacturing systems. Aiming at the former, Zhao et al. proposed an ensemble discrete differential evolution algorithm, in order to minimize makespan and solve the blocking flow shop scheduling problem in a distributed manufacturing environment with high performance [6]. Zhao et al. proposed a self-learning discrete Jaya algorithm to solve the multiobjective problem of the energy-efficient distributed no-idle flow shop scheduling problem in heterogeneous factory systems [7]. Liu et al. designed a mixed variable differential evolution algorithm to solve the electric vehicle charging scheduling problem and considered more practical situations [8]. Zhao et al. proposed a two-stage co-evolutionary algorithm with problem-specific knowledge, which optimizes the scheduling problem of the no-wait flow shop by considering energy consumption [9]. Zhou et al. proposed a self-adaptive differential evolution algorithm to minimize makespan and efficiently solve the batch machine scheduling problem with different batch job release times and sizes [10]. The above methods have shown excellent performance in solving different scheduling problems. The solved scheduling problems consider more practical factors, have a strong problem-solving ability, and have more engineering application value. However, the jobs’ information, such as processing and release time, is known before the scheduling strategy is obtained. Aiming at the latter, Pergher et al. [11] proposed a combination of discrete event simulation and flexible interaction to weigh the compensation. The performance of different dispatching rule combinations was evaluated through discrete event simulation based on the total cost, production quantity, and delay goals. Accordingly, the flexible interaction trade-off compensation method was adopted for different DRs to complement each other’s advantages to satisfy the decision makers’ settings. Moreover, Rodrigues et al. [12] used discrete event simulation technology to analyze the production capacity to avoid the impact of process changes on the realization of production goals to continually ensure the rationalization of production scheduling. Additionally, Alexandros et al. [13] adopted a hybrid simulation method to efficiently achieve multiobjective scheduling and averted conflicts between different schemes during rescheduling. The method could obtain the effect of all alternative scheduling strategies to solve the same scheduling problem quickly and select the optimal scheduling strategy. However, the operating process of the manufacturing system cannot be changed, and numerous scheduling strategies exist to solve the same scheduling problem in a manufacturing system containing significant uncertainties in the production process and various types of process flows. Consequently, the computing resources and workload occupied during the evaluation process of the scheduling strategy will be enormous.
More recently, numerous studies have used AI algorithms to overcome the shortcomings of the highly complex and time-consuming simulation optimization of heuristic methods, such as decision trees, hackberry Bayes, and neural networks, to directly abstract regression or classification models from sufficient data samples and then generate scheduling strategies based on real-time production data. Moreover, Zahmani and Atmani [14] aimed to minimize makespan of a manufacturing system based on data mining technology to develop a decision tree classification model that could allocate a set of DRs to machines in real time. Ahn and Hur [15] aimed to minimize the total tardiness for batch scheduling problems, developed a method to generate training datasets from historical data to train neural networks, and ultimately achieved the goal of optimal scheduling decision-making. Gokhan et al. [16] considered the average delay time as the goal, mined the scheduling knowledge in the data using the decision tree algorithm, and assigned DR to the machine in real time in each scheduling cycle. Importantly, the quality of the sample data and predictive performance of the constructed model are key to this method. Owing to continuous changes in the production environment, the scheduling knowledge in the historical data is gradually becoming insufficient; therefore, the predictive performance of the constructed model will progressively decline.
Furthermore, when the aforementioned method optimizes the scheduling problem, the information of the job that needs to be completed is known at the initial moment of the manufacturing system. Accordingly, to get closer to the actual production environment and meet production objectives, a scheduling strategy with robust performance is generated to schedule jobs under resource constraints in the manufacturing system with one or more scheduling standards as the objectives [17, 18]. Precise methods that can provide optimal solutions are ideal for simple static scheduling problems. Heuristic methods are primarily suitable for complex static and dynamic scheduling problems with predictable disturbance factors. Although AI algorithms are capable of generating scheduling strategies with good performance in real time from the overall perspective of the production process of the manufacturing system, the existing literature indicates a lack in studies that have reported the ability to reduce or eliminate the unfavorable impact of randomness at the moment of job release on the operation process of the manufacturing system. For instance, Pisut [2] assumed that the moment of job arrival was the same when constructing the model. In addition, the model constructed using AI algorithms is data driven, and the real time update of historical data is an essential driving factor in ensuring that the constructed model continually exhibits high predictive performance. Zahmani, Gokhan, and Gilseung proposed that updating models constructed using AI methods in the manufacturing industry is a critical direction for future research [2, 15, 16]. The study considered the disturbance factor of the randomness of the job release moment when solving the scheduling strategy. Accordingly, a model was developed for selecting the optimal DR by considering the average job flow time in the job shop manufacturing system as the scheduling criteria. In addition to performing real time updating of the scheduling strategy to realize the scheduling of jobs that were randomly released to the manufacturing system, the historical database can also be updated in a targeted manner to reduce data redundancy, thereby laying a data foundation for model updating.
The rest of the article is organized as follows: A significant scheduling problem faced by the manufacturing system is analyzed in the Section 2. In Section 3, a method is designed to solve the problem, and the construction of a model for selecting the optimal DR is introduced in detail. In Section 4, the superior performance of the constructed model for selecting optimal DR is verified through case analysis. The final section, Section 5, presents the study’s conclusion and prospects for possible future research.
2. JSSP Analysis
Once a job enters the manufacturing system, it goes through the process of waiting to be processed and being processed. Finally, the processed job leaves the system. The time required to wait for processing is defined as the non-value-added time, and the time required for processing is defined as the value-added time. Under the requirements of an ideal production process or lean production, the non-value-added time of jobs in the production system, such as inventory and waiting, should consider the minimum possible total processing time [19]. Since the job’s processing time is fixed, reducing the non-value-added time of the job in the system is critical for improving the performance of the production system. Accordingly, the average flow time of jobs in the system is an essential criterion. Ðurasević and Veronique et al. considered the flow time of jobs in the system as an essential criterion in measuring the performance of a DR and reviewed the applicability of multiple DRs in different manufacturing systems [20, 21]. Moreover, Lee and Wang [22] realized the optimization of the job shop scheduling problem with two machines to minimize the flow time of the job, thereby reducing the total delay of the job in the manufacturing system. The flow time of job in the manufacturing system is shown in equation (1), where represents the completion moment of job and denotes the moment when job is released to the system. The average flow time in the manufacturing system is shown in equation (2), and symbolizes the total number of operations in the production system. The research goal of this study is to actively update the scheduling strategy when a new job release disturbance is encountered during the operation of the job shop manufacturing system, thereby ensuring the minimization of the value of from the perspective of the overall operating cycle of the manufacturing system.
3. Proposed Approach
3.1. Approach Design Process
The method of optimizing the job shop scheduling problem proposed in this study involves mining knowledge from historical data to construct a model for selecting an optimal DR that can generate scheduling strategies in real time during the production process. The scheduling strategies are updated with the progress of the production process. Determining the update time of the scheduling strategies is the premise of realizing the method proposed in this article. The update time of the scheduling strategy is the switching time of the DR. Because job release time and the information in a production period are not known before release to the manufacturing system, the method of dividing the sub-scheduling period is adopted. The next sub-scheduling period schedules the jobs arriving in the previous sub-scheduling period. The start time of the sub-scheduling period is used as the switching moment of the DR to realize the updating of the scheduling strategy. The division process of the sub-scheduling period is as follows: the completion time of schedulable jobs in the initial stage of production is the initial sub-scheduling period. The completion time of the jobs arriving in this period is the next sub-scheduling period, and the sub-scheduling periods are iteratively divided in turn. In this article, optimizing the scheduling problem using the constructed model for selecting the optimal DR primarily includes two parts: model construction and application. The implementation process is illustrated in Figure 1. The model construction process primarily includes attribute data collection, label data collection, data storage, and processing; and a model for selecting the optimal DR construction primarily includes the collection and processing of job attribute data newly released into the manufacturing system, the model for selecting the optimal DR to output the initial DR, the evaluation model to determine whether to update the DR, the output of the final DR, and the generation of scheduling strategies. If the initial DR needs to be updated, the updated DR and attribute data of the job are stored.

3.1.1. Collect Attribute Data
Collecting the data generated during the operation of the manufacturing system is a prerequisite for realizing digital empowerment of job shops. MES refers to a data collection platform connected to enterprise resource planning (ERP), product data management (PDM), digital control system (DCS), and other systems [23]. As an essential data platform in manufacturing systems, MES contains a large amount of production-related data and offers an essential data basis for scheduling.
The scheduling module is one of the core modules of the MES. For the production scheduling problem that needs optimization in this study, the data that need to be collected primarily include job process flow, setting time, and processing time before the job is being processed; and machine status is mainly idle and working.
3.1.2. Collect Label Data
Since the related machines and jobs are not scheduled, the collected attribute data does not contain scheduling knowledge. Therefore, it is necessary to find a scheduling strategy or a generator of scheduling strategy with superior performance as the label of the original attribute data, and then obtain the data with scheduling knowledge. DR has the advantages of simplicity, interpretability, low computational load, and shorter execution time than heuristic and meta-heuristic methods [24]. The ability to respond swiftly to dynamic events is commonly used to solve dynamic scheduling problems [25, 26]. It uses a priority function to prioritize jobs waiting to be processed and then prioritizes jobs with higher priority. In this study, DR was used as the scheduling strategy generator when collecting the label data. DR is applied in the units of the machines. The DR use process is depicted in Figure 2, where the average flow time of the job to be scheduled in the manufacturing system is used as the criterion. Based on simulation technology, the optimal DR is selected from a preset DR library, and a scheduling strategy is generated to determine the order in which the machine processes jobs. The DR is the label data of the attribute data of the scheduled job.

3.1.3. Data Storage
The collected attribute data and tags are combined in a one-to-one correspondence to form a set of original instance sample data with scheduling knowledge and then store the instance sample data.
3.1.4. Data Processing
This is primarily divided into two aspects: the first one involves changing the data structure to suit the input of machine learning algorithms and the other improves the quality of the data through data preprocessing. The structure conversion process of the original instance sample data for the former is illustrated in Figure 3. The structure of the data before processing is , where represents the number of records of attribute data, denotes the number of features of attribute data, and corresponds to the number of labels and applies a descending dimension method to the structured data such that it becomes structure data suitable for the input of machine learning algorithms. This method can use statistical methods, such as mean, standard deviation, and coefficient of variation, or intelligent algorithms, such as PCA and SVD. The processed data are defined as the available instance sample data. For the latter, data preprocessing included duplicate value elimination, missing value filling, and outlier deletion. The repeated value elimination operation in data preprocessing helps avoid deviations in the model training direction and promotes the efficiency of model construction. The missing value filling operation helps supplement the information in the data and delete the uncertain information in the data. The operational deletion of outliers improved the overall prediction effect of the model. In this study, judging whether the data are of high quality refers to the following five criteria proposed by Alexandropoulos et al.: validity, conciseness, accuracy, consistency, and completeness [27].

3.1.5. Construct the Model for Selecting the Optimal DR
This study adopted the machine learning classification algorithm to construct the model for selecting the optimal DR based on historical data. As shown in Figure 1, the model for selecting the optimal DR includes the initial model for selecting the optimal DR and evaluation model. The initial model for selecting the optimal DR is responsible for generating the initial DR based on the real-time state of the manufacturing system, which is a multiclass model and is used to generate scheduling strategies. The evaluation model is responsible for determining whether the initial DR performance is suitable for scheduling the current manufacturing system and belongs to the two-category model. The DR that is judged to have poor performance by the model is updated through simulation technology to improve the model’s prediction accuracy for selecting the optimal DR.
3.1.6. Apply the Model to Select the Optimal DR
Once the new job is released into the manufacturing system, the relevant attribute data of the new job are processed when the machine state becomes idle. The processing is based on the data-processing process explained above, and then the processed job is used as input into the initial model to select the optimal DR. The initial DR is output, and new attribute data are generated. Ultimately, the new attribute data and processed job attribute data are used as input into the evaluation model. If the evaluation model determines the initial DR, the DR is used to generate the scheduling strategy; otherwise, the initial DR is updated through simulation technology, the updated DR is used to generate the scheduling strategy, and then the job attribute data and updated DR are combined into a piece of instance sample data and stored in the database to realize the dynamic update data.
3.2. The Process of Constructing the Model for Selecting Optimal DR
The training-testing process of constructing the model for selecting optimal DR based on machine learning algorithms includes three stages. The first stage is to construct the initial model for selecting optimal DR, which is responsible for outputting DR that can be used to guide machine processing based on the input job data that has been preprocessed. The second stage is to construct an evaluation model. The purpose is to evaluate the DR performance output by the model constructed in the first stage and update the DR with poor performance based on the judgment result and simulation technology. In this process, the datasets that cause the performance of the model constructed in the first stage to deteriorate can be screened out, which is conducive to promoting performance improvement after the model is updated. The third stage is to integrate simulation technology with the models constructed in the first two stages.
Ensemble learning has become a prevalent machine learning method with good prediction performance. Ensemble learning improves the prediction performance of a single model by training multiple models and combining their prediction results [28, 29]. It has the following advantages: it avoids over-fitting, has a short calculation time, does not easily fall into local optima and dimensional disasters, better adapts to the data space, and improves the prediction performance of class imbalance problems [30, 31]. This study adopts the classical classification algorithm of ensemble learning, the random forest (RF) algorithm, for constructing the initial model for selecting the optimal DR. The decision tree algorithm is its base evaluator. The support vector machine (SVM) algorithm is a powerful two-classifier algorithm. The classification process involves finding a hyperplane as the decision boundary for the given data. All sample points on one side of the decision boundary belong to one class, and all sample points on the other belong to another class. An SVM algorithm was adopted to construct the evaluation model.
The model is time sensitive, and the performance of the constructed model may degrade over time. Thus, enhancing the adaptability of the model is instrumental for ensuring that the model is always efficient. A main solution involves detecting and adjusting or implementing model changes in time when the model predicts errors. Update of the data is particularly critical to ensure the realization of the above methods. Therefore, this study considers the update of data when constructing the model for selecting the optimal DR, and the construction process includes training and prediction processes.
The training of the model includes training the initial model for selecting the optimal DR and training the evaluation model, as shown in Figure 4. The specific training process is as follows:(1)Process the original instance sample data to form an available instance sample dataset, , where denotes the sample attribute and denotes the sample label.(2)Divide dataset into training dataset and test dataset.(3)Input the training dataset into the RF algorithm to construct the initial model for selecting the optimal DR.(4)Input as the attribute dataset 1, which has been trained in the initial model for selecting the optimal DR to obtain the prediction label . In the initial model for selecting the optimal DR prediction process, the parameter was generated, defined as attribute dataset 2.(5)Compare label of the training data with the predicted label , the output is 1 for the same case and 0 for the different cases, and all output results are defined as data .(6)Use attribute data 1, attribute data 2, and predicted label as inputs for the SVM algorithm to construct an evaluation model. When testing the model’s performance, selecting the appropriate indicators is critical for ensuring the model’s capabilities. Different indicators should be used to evaluate the models for different problems. For multiclass problems, the evaluation indicators primarily include the accuracy and precision. For the two-class classification problem, the samples are often unbalanced. When the classification goal is to capture the minority class and guarantee the majority class is classified correctly, the evaluation index principally involves the AUC area. Since the label data of the constructed initial model for selecting the optimal DR and model for selecting the optimal DR are not classified, the accuracy rate is utilized as the test index of the model for selecting the optimal DR. Moreover, since the goal of the constructed evaluation model is to capture as many initial DRs with poor performance as possible while retaining as few initial DRs with good performance as possible, the AUC area was utilized as the test index of the evaluation model. Simulation was incorporated in the testing process to improve the model’s performance for selecting the optimal DR and updating the historical data. The model testing process is illustrated in Figure 5, and the specific testing process is as follows:(1)Input the attribute data of the test data into the trained model for selecting the optimal DR, output the initial DR, and generate the corresponding parameter data .(2)Input the attribute data and parameter data into the trained evaluation model and output the performance of the initial DR. Accordingly, output result 1 represents good initial DR performance, and the output result 0 represents poor initial DR performance.(3)When output result of the evaluation model is 1, the initial DR is output. When the output result of the evaluation model is 0, the initial DR is updated using simulation technology, and the updated DR is output.


4. Case Study
Since the scheduling strategy generated in this study is aimed at the machine, the DR assigned to the machine is based on the job information in the manufacturing system, and then the scheduling strategy is updated. Therefore, when verifying the performance of the proposed method, the application object was set as a manufacturing system with only one machine. The DR is assigned when the previous scheduling strategy is executed. The assigned DR is responsible for scheduling jobs that can currently be scheduled, and newly released jobs are included. The assigned DR sets priorities for multiple jobs or multiple batches of jobs to be processed, and only selects the average flow time of the jobs in the manufacturing system as the scheduling criteria. Prior to verifying the performance of the proposed method, the following hypotheses were proposed to avoid other disturbance factors:(i)No failure occurred during machine operation;(ii)Each job is processed only once on the machine and there is no priority between the jobs;(iii)The machine can only process one job at a time;(iv)The delivery date of the job is not considered.
Many studies have shown that DR cannot adapt well to all manufacturing systems under each scheduling criteria [16, 26]. This study selects single-parameter DR and mixed-parameter DR to form the DR library based on job parameters [25, 32]. Among them, the parameters of the single-parameter DR selected include the setting time before the job processing, the processing time of the job, and the total processing time of the job, which is equal to the sum of the setting time and the processing time. The solution formula of is expressed in equation (3). The single-parameter DRs in the DR library are presented in Table 1.
In order to expand the scope of the DR library and make better use of the processing parameters of the job, this article selects multiple mixed-parameter DRs. The parameters of the mixed-parameter DR were obtained by randomly selecting two parameters from the above parameters and mixed by multiplication. The mixed-parameter DR combines the advantages of two single-parameter DRs [26]. The parameter solution of the mixed-parameter DR is expressed in equation (4), where Parameter 1 and Parameter 2 are two different parameters and denotes the mixed-parameter value. Subsequently, the parameter value corresponding to each job is sorted from small to large to determine the processing order of the job. The smaller the Z value of the job, the higher the priority it will be processed. The parameters of the mixed-parameter DR are listed in Table 2.
This study refers to Taillard’s method of generating a manufacturing system, and the size of the system is represented by “” [32]. The size of the scheduling system generated in this study is . The processing time of the job obeys the geometric distribution of , the setting time of the job adheres to the geometric distribution of , and and determine the difference of each job. The greater the value of and , the greater the difference between the jobs. By adjusting the values of and , the difference of the production system can be expressed. The number of jobs released to the manufacturing system each time adheres to the geometric distribution of , and the release moment when the job is released to the manufacturing system obeys the geometric distribution of . The formula for solving the is expressed in equation (5). The value of is equal to the moment at which the manufacturing system completes all jobs. When verifying the performance of the model for selecting optimal DR, according to the distribution range of and , four production systems are generated, which are the large value and the large value; the large value and the small value; the small value and the large value; and the small value and the small value. The values of take 100 and 1000, respectively, and the values of take 10 and 100, respectively. The amount of data needed to construct the model is ensured. One thousand jobs are included in each manufacturing system to ensure the amount of data needed to construct the model. Some examples of the generated data representing the manufacturing system are listed in Table 3.
For each job released to the manufacturing system, the best DR was selected from the DR library using Python tools with a global search method. The attribute data of the job are used as the attribute data of the original instance sample data, and the optimal DR is used as the label data of the original instance sample. The original instance sample data were processed according to the data-processing method described in the previous section. The PCA algorithm was used for the dimensionality reduction. Some of the available sample data are presented in Table 4.
Furthermore, 70% of the total available sample data are selected as the training data of the model for selecting the optimal DR to construct a better classification model, and the remaining 30% is used as the test data of the model for selecting the optimal DR. Accordingly, 70% of the data used to train the model for selecting the optimal DR are selected to train the initial model for selecting the optimal DR. The remaining 30% is used as the test data for the constructed initial model for selecting the optimal DR. The attribute data of training the initial model for selecting the optimal DR is utilized as an attribute data of the training evaluation model. Owing to sample imbalance when training the model, the parameter “class weight = balanced” is introduced to avoid the adverse impact on model performance, and the grid search method is used for cross-validation five times. The accuracy of the initial model for selecting the optimal DR constructed by the different manufacturing systems is shown in Table 5. The average accuracy was 0.812.
The initial model for selecting the optimal DR constructed by the RF algorithm generates prediction parameters. This study collects confidence data as one of the attribute data for constructing the evaluation model. According to the method of obtaining data samples for training and testing the evaluation model proposed in this study, the sample data for constructing the evaluation model were generated. Part of the attribute data and label data of the training data of the evaluation model are obtained according to the sample data of the training of the initial model for selecting the optimal DR. Part of the attribute data and label data corresponding to the test data of the evaluation model are obtained according to the sample data of the test of the initial model for selecting optimal DR, primarily because the SVM algorithm is susceptible to the influence of the data dimension when constructing the model [33]. This study uses a standardized method for dimensionless attribute data, as shown in equation (5), where A represents the attribute data value, B corresponds to the average value of the attribute data, C symbolizes the standard deviation of the attribute data, and D denotes the dimensionless attribute data. Some examples of dimensionless data are listed in Table 6. When training the evaluation model, the “class_weight = balanced” parameter was introduced to avoid excessive ignorance of a few sample classes. The grid search method was used for cross-validation five times to obtain the optimal parameters of the evaluation model. The ROC curves of the evaluation models constructed by the different manufacturing systems are shown in Figure 6. The average value of the AUC area in the figure is 0.908, and the minimum value is 0.897. From the AUC area value, it can be concluded that the constructed evaluation model has good performance.

(a)

(b)

(c)

(d)
Combine the initial model for selecting the optimal DR, monitoring model, and simulation process. Table 7 shows the accuracy of the model for selecting the optimal DR constructed by the different manufacturing systems, and the improvement effect is obvious compared to the initial model for selecting the optimal DR. The improvement rates of the model for selecting the optimal DR constructed by the four categories of manufacturing systems are 14.76%, 14.62%, 12.04%, and 12.92%, respectively, which proves the effectiveness of adding between the evaluation model and the process of simulating the update of DR to the initial model for selecting the optimal DR.
In total, 20 manufacturing systems were generated according to the method described above to verify the performance of the constructed model for selecting the optimal DR in the actual production process. Since job information cannot be obtained at the initial stage of the manufacturing system, the optimal DR cannot be selected from the overall perspective of the manufacturing system. Therefore, the model for selecting the optimal DR constructed in this study is used to update the scheduling strategy by switching the DR during production. The DR combination is used to optimize the JSSP to generate the local optimal scheduling strategies and promote the global optimal scheduling strategies. In order to verify the application effect of the model for selecting the optimal DR, four categories of manufacturing systems were generated according to the above method of generating manufacturing systems, and each category of manufacturing system contained 20 samples. In scheduling each manufacturing system, the model for selecting the optimal DR was used to obtain the value generated by the DR combination for the value generated by each DR in the DR library.
Value generated by each DR in the DR library, the optimal value, and the worst are selected. Finally, we selected the best DR, worst DR, and DR generated by the method proposed in this study to form a control group. The performance comparison is shown in Figure 7.

(a)

(b)

(c)

(d)
As can be seen from Tables 8 and 9, for a manufacturing system with the large value and the large value, the DR combination generated by the method proposed in this article is 5%–12% less than the average flow time of the manufacturing system jobs of the optimal DR, and the average flow time is reduced by 252.61 on average, the average reduction rate of the average flow time is 7.79%, which is 19%–30% less than the average flow time of the manufacturing system jobs of the worst DR, the overall reduction is 5%–30%, the average flow time is reduced by 964.57 on average, and the average reduction rate of the average flow time is 24.39%. For a manufacturing system with the large value and the small value, the DR combination generated by the method proposed in this article is 5%–12% less than the average flow time of the manufacturing system jobs of the optimal DR, and the average flow time is reduced by 251.3 on average, the average reduction rate of the average flow time is 8.67%, which is 23%–33% less than the average flow time of the manufacturing system jobs of the worst DR, the overall reduction is 5%–33%, the average flow time is reduced by 1003.92 on average, and the average reduction rate of the average flow time is 27.54%. For a manufacturing system with the small value and the large value , the DR combination generated by the method proposed in this paper is 4%–10% less than the average flow time of the manufacturing system jobs of the optimal DR, and the average flow time is reduced by 41.93 on average, the average reduction rate of the average flow time is 6.23%, which is 19%–36% less than the average flow time of the manufacturing system jobs of the worst DR, the overall reduction is 4%–36%, the average flow time is reduced by 267.46 on average, and the average reduction rate of the average flow time is 29.57%. For a manufacturing system with the small value and the small value , the DR combination generated by the method proposed in this paper is 4%–11% less than the average flow time of the manufacturing system jobs of the optimal DR, and the average flow time is reduced by 26.92 on average, the average reduction rate of the average flow time is 7.36%, which is 20%–35% less than the average flow time of the manufacturing system jobs of the worst DR, the overall reduction is 4%–35%, the average flow time is reduced by 139.76 on average, and the average reduction rate of the average flow time is 28.97%.
In the process of applying the model for selecting the optimal DR, because it includes the evaluating model for DR and the simulation technology to update DR, the targeted collection of the data that cause the performance of the initial model for selecting the optimal DR to degrade is realized. The collected data and the original data were combined into a new dataset, and the accuracy of training the initial model for selecting the optimal DR using this new dataset is shown in Table 10 The accuracy of the initial model for selecting the optimal DR constructed under different categories of manufacturing systems has been improved, the average improvement rate is 4.85%, and the improvement effect is obvious.
In summary, the method proposed in this study and the constructed model for selecting the optimal DR uses the combination of DR to achieve the goal of reducing the average flow time of operations in the manufacturing system, solves the disturbance problem of unpredictability of job release randomness, and also uses the evaluation and update of the initial model to select the optimal DR, thereby, increasing the accuracy of model predictions and realizing efficient updating of historical data.
5. Conclusion
In this study, six single-parameter DRs and three mixed-parameter DRs were selected to form a DR library, and ML algorithms and discrete event simulation technology were combined to construct a model for selecting an optimal DR that can efficiently assign DR to jobs released into the manufacturing system in real time. Consequently, the average flow time of jobs in the manufacturing system was shortened, historical data were updated in a targeted manner, and data redundancy was reduced. A case study proved that constructing an evaluation model can monitor the availability of the DR output from the initial model for selecting the optimal DR and then updating the unavailable DR, which can elevate the prediction accuracy rate of the initial model for selecting the optimal DR. In this process, data that cause the initial model to select the optimal DR prediction accuracy rate to decrease are collected, thereby providing a high-quality data basis for a model update. When applying a manufacturing system containing a single machine, the method proposed in this study dramatically reduces the average flow time of jobs in the manufacturing system. Since the method proposed in this article is only applied to the processing of a single machine, and only the data update is researched, the next step is to continue the research from the multimachine manufacturing system. Therefore, we can continue to study the quantitative update of the constructed model.
Data Availability
The authors confirm that the data supporting the findings of this study are available within the article.
Conflicts of Interest
The authors declare no conflicts of interest.
Acknowledgments
This work was supported by Jilin Scientific and Technological Development Program (Grant no. 20210201037GX) and Jilin Major Science and Technology Program (Grant no. 20210301037GX).