Abstract

Affected by economic globalization and market diversification, more manufacturing enterprises realize that large-scale production cannot adapt to the current market environment. The new trend of multivariety customized mixed-line production brings a higher level of disturbances and uncertainties to production planning. Traditional methods cannot be directly applied to the classic flexible job shop scheduling problem (FJSP). Therefore, this paper presents an adaptive scheduling method for mixed-line job shop scheduling. First, the scheduling problem caused by combined processing constraints is studied and transformed by introducing the definition of virtual operation. According to the situation of the coexistence of trial-production and batch production, the disturbance processing mechanism is established. And a scheduling decision model is established based on contextual bands (CBs) in reinforcement learning to overcome the shortcoming of poor performance of traditional single dispatching rule strategy. Through continuous trial and error learning, each scheduler can select the most suitable scheduling rules according to the environment state. Finally, we benchmark the performance of the scheduling algorithm with scheduling methods based on a variety of single scheduling rules. The results show that the proposed algorithm not only improves the performance in the mixed production scheduling problem but also effectively copes with emergency trial-production orders.

1. Introduction

Affected by economic globalization and market diversification, the requirements for a product are becoming more diversified, individualized, and dynamic. Most manufacturing enterprises realize that large-scale production of a single product cannot adapt to the current market environment and begin to adjust to multivariety customized mixed-line production. However, it is inevitable to encounter uncertain factors such as product variety growth, production batch reduction, demand fluctuation, and order variations, which affect the production efficiency of enterprises.

As an important technical means to improve the utilization of manufacturing resources and realize efficient production operation, job shop scheduling is the core of production process control. The optimized job shop scheduling scheme can greatly improve the performance of mixed-line production system, reduce cost, and improve productivity. As a fundamental problem in manufacturing, flexible job shop scheduling problem (FJSP) normally only considers routing constraints and machining constraints. Most researchers assumed that a machine cannot process more than one operation at the same time [1] and that it only needs to meet the conventional routing constraints. However, in actual production, some specific production constraints usually exist, e.g., combined processing constraint. It is common in the production of high-precision components, like the manufacturing of various molds and shell parts. In order to ensure the accuracy of assembly, reduce the difficulty of processing, and improve work efficiency, several jobs must be processed simultaneously on the same machine. Therefore, the traditional FJSP method cannot deal with this complex situation.

On the contrary, in addition to common disturbance events (such as equipment failures and emergency orders), the mixed-line job shop also frequently encountered the situation of coexistence of trial-production orders and batch production orders. Among them, the trial-production order refers to the task of putting new model products into production for the first time. And the processes of the trial-production product are uncertain, and the processing time is difficult to be accurately estimated before the execution begins. The processing technology and time of tasks in batch production orders are stable to a certain extent. However, when either batch production orders or development production orders are executed on the same production line, they will be affected by the uncertainty of products in the trial-production order, which makes the original production plan of the batch production order unable to be implemented on schedule. This not only reduces the production efficiency of the whole system but also makes the production scheduling problem more complicated in the job shop environment.

In the literature, there are a variety of methods and tools for job shop scheduling. Mathematical programming approaches achieve global optimal schedules by getting critical solutions of programming functions under constraints [2]. To model and solve an optimization model with large variable space, metaheuristic methods (e.g., GA, swarm intelligence) [3, 4] are introduced to simulate manufacturing systems with simple and natural rules. Recently, AI fuels an increasing interest to solve dynamic scheduling problems by continuously improving the scheduling performance of schedulers. Lawrynowicz [5] proposed the use of scheduling rules to achieve higher efficiency, which integrates an expert system with heuristic algorithms to solve the scheduling problem under the constraints of supply chains. Liu [6] proposed a method of composite dispatching rules using an analytic hierarchy process (AHP) to improve the performance of scheduling, but subjective experience is required. Trentesaux et al. [7] proposed four new dispatching rules by performing a simulation-based analysis for dynamic job shop scheduling. Liu [8] proposed an adaptive real-time scheduling method, which uses the Q-learning algorithm to optimize the rule selection. Qu et al. [9] used a reinforcement learning (RL) method to adaptively generate dispatching rules based on the system state. Chen et al. [10] studied the relationship between environmental dynamics and lean production level and proposed the optimization method of lean redesign in different dynamic environments. Chen et al. [11] proposed a knowledgeable encapsulation method of scheduling model for steel production to realize the reuse of scheduling knowledge and the dynamic construction of rescheduling model. Chen et al. [12] designed an automatic generation method of dispatching rules based on genetic programming to solve the rescheduling problems under unpredictable events. Zhang et al. [13] proposed a dynamic job shop scheduling method based on proximal policy optimization for multiagent manufacturing systems, which integrates the self-organization mechanism and self-learning strategy.

From the literature review, the existing scheduling methods are mainly divided into three categories: heuristic algorithm-based, dispatching rule-based, and reinforcement learning-based. The scheduling method based on the heuristic algorithm models the scheduling problem through coding and decoding mechanisms and then evolves the population of feasible solutions through crossover and mutation operations, so that high-quality scheduling solutions can be obtained based on the evaluation of fitness function. Although this method can obtain the high-quality scheduling solution, it needs to be rescheduled in the case of production disturbance, and its real-time response ability is poor. The rule-based method uses a fixed set of one-pair scheduling rules (i.e., SPT + FIFO) to realize the selection of available machines and its buffer task sequencing. When the production disturbance occurs, this method has a good response performance to acquire the scheduling scheme, but the quality of its solution is not high. Therefore, the motivation of this paper is to propose an adaptive scheduling method, which can intelligently choose the best scheduling strategy according to the state of workshop environment. The scheduling mechanism based on the LinUCB algorithm can continuously optimize the decision-making module, so that it can intelligently select the best dispatching rules with the change of environment and always obtain high-quality scheduling strategies. Therefore, the LinUCB-based scheduling method proposed in this paper can not only quickly respond to disturbance events but also obtain high-quality scheduling solutions.

In this paper, the mixed-line job shop scheduling problem of aerospace structural parts with combined processing constraints is studied. The mathematical model of mixed-line scheduling problem is established and transformed into a classical flexible job shop scheduling problem. According to the actual situation of coexistence of trial-production orders and batch production orders, the case-based learning method is introduced, and the disturbance processing mechanism is established. A novel adaptive real-time scheduling method is proposed for overcoming the shortcoming of poor performance of traditional single dispatching rule strategy. In this approach, the scheduling process is divided into the machine selection stage and buffer job sequencing stage and modelled as contextual bandit. The LinUCB algorithm is used as a rule strategy learning algorithm in this research. Through continuous trial and error learning, each decision can select the most suitable machine selection rules and buffer job sequencing dispatching rules according to the real-time environment state after learning to achieve scheduling optimization, so as to improve the adaptability and optimization performance of the scheduling algorithm. In addition, for the insertion of the order of developed parts, this paper also uses the k-nearest neighbor algorithm to find out the existing process knowledge most similar to the process of the developed parts from the knowledge base and then applies the LinUCB-based scheduling method to realize task allocation for the process to be scheduled of the developed parts by inheriting the process knowledge. Through the implementation of the disturbance handling strategy, the proposed scheduling method can not only solve the disturbance of the development parts to the normal production activities but also make use of the existing scheduling knowledge to realize the scheduling optimization of the manufacturing system. Finally, the proposed methodology is evaluated and validated with experiments in a smart manufacturing setting. We benchmark the performance of the scheduling algorithm with scheduling methods based on a variety of single dispatch rules. The experimental results show that the proposed algorithm not only improves the performance in the mixed production scheduling problem but also effectively copes with emergency trial-production orders.

The main contributions of this paper are listed as follows:(1)A disturbance handling strategy based on the k-nearest neighbor algorithm is proposed for responding to the insertion of the unknown developed parts’ order(2)A dynamic and adaptive scheduling method based on the LinUCB algorithm is presented for mixed-line workshops, which can intelligently select the best scheduling strategy according to the state feature of current workshop environment

The rest of this paper is organized as follows: In Section 2, the flexible job shop scheduling problem with the combined processing constraint is defined in detail, as well as the mechanism for responding to uncertain disturbance events. Then, the proposed method is explained in Section 3. Simulation experiments are presented in Section 4, and the conclusion is given in Section 5.

2. FJSP with Combined Processing Constraint

2.1. Combined Processing Constraint

In the classic FJSP, only routing constraints and machining constraints are typically considered. However, for some jobs with assembly relationship, if these jobs are processed separately, it will be difficult to ensure the accuracy required for assembly. Therefore, in order to meet the requirements of assembly accuracy, different operations of two or more jobs must be processed on one machine at the same time, and subsequent operation can be processed only when both jobs are ready, that is, combined processing.

As shown in Figure 1, the job J1 and job J2 are processed in two process routes, respectively, and the jth operation of job Ji is labeled as Oij, while the third operation O12 and O22 of two jobs needs combined processing, and O13 and O23 can be processed only after the combined processing of O12 and O22 is completed.

2.2. Solution for Combined Processing Constraint

In an FJSP with the combined processing constraint, two operations of two different jobs need to be scheduled together when they need combined processing and meet the combined processing constraint. In order to solve this divergence and make the scheduling decision of the operations needing combined processing be unique, the technology of the virtual operation is used in this research. Virtual operation is composed of operations requiring combined processing and is scheduled as one operation. The operation currently released by the job with small job number is the main operation and is responsible for the decision of virtual operation scheduling. The combined processing operation is the auxiliary operation, and only information is recorded.

In the process of scheduling, the two operations are treated as one virtual operation. Since the composition of the components and the job number are unique, the operation with a smaller job number is used as the master operation and is responsible for making scheduling decision for the virtual operation. The other operation is auxiliary and only records the information related to itself.

The processing flow of combined processing constraints is shown in Figure 2. The details are as follows:(1)If a job needs to be scheduled, judge whether it needs combined processing.(2)If combined processing is not needed, the job can be scheduled directly; otherwise, further judge whether it meets the combined processing constraint and turn to (3).(3)If the combined processing constraints are met, these two operations will be combined into one virtual operation for scheduling; otherwise, wait until other operations are released and turn to (3). In this way, the FJSP with the combined processing constraint can be transformed into the classic FJSP and then solved by the methods for the classic FJSP.

2.3. Model Formulation

Based on the classical FJSP, the combined processing constraint is further considered, the makespan is taken as the optimization objective, and the following assumptions are made for this problem [4, 14]:(1)The jobs arrive at the job shop in batches, and the number of jobs arriving in each batch is random(2)Only when the combined processing constraints are met, can the combined processing operation be processed(3)Each operation can be executed on a series of available machines(4)The type and arrival time of a job are known only when the job actually arrives(5)Each machine can only process one ordinary job at the same time or process multiple jobs with the combined processing constraint at the same time(6)The preparation time is included in the processing time, and the transportation time is not considered(7)Once the operation starts processing, it cannot be interrupted until the operation is completed

Based on the above assumptions, the symbols are defined as follows:J: a collection of all jobsM: a collection of all machinesCi: completion time of job Ji - completion time of job iOij: the jth operation of job istij: the start processing time of the jth operation of job iitij: the end processing time of the jth operation of job ismmi: the start time of the ith operation on machine m. Ommi: the end time of the ith operation on machine m. tijm: the processing time of machine M corresponding to the jth operation of job iCOV()

The mathematical model of FJSP with the combined processing constraint can be established as follows.

The makespan is selected as the optimization index, which is recorded as C0:

The objective function is

Constraint function is

Equation (3) indicates that only one machine can be selected for each operation of each job; (4) indicates that if the current operation of the job does not require combined processing, it only needs to meet the conventional routing sequence constraints, and if combined processing is required, it also needs to meet the combined processing constraints; (5) indicates that the next operation will be released for processing only if the current operation has been processed on the machine; (6) indicates that the processing time of the operation shall be the same as that of the selected machine.

2.4. Mechanism for Responding to Uncertain Disturbance Events

In addition to the common uncertain disturbance events in the actual production process (such as machine failure, emergency work order, and raw material delay), the mixed-line job shop often encounters the mixed-line production of trial-production orders and batch production orders, especially in the new product development stage of military and aerospace industry. The emergency trial-production order will have a great impact on the original production plan of batch orders. Different from the emergency order of batch production, the type and processing time of trial-production cannot be known in advance. To solve this problem, the k-nearest neighbor method (k-NN) is used in this research to find the history operation which is the most similar to trial-production (as shown in Figure 3). Through this disturbance processing strategy, we can not only solve the disturbance of emergency orders but also use the existing scheduling knowledge to realize the optimization of the system.

The k-nearest neighbor method is a basic classification and regression method, which was proposed by Cover and Hart [15]. It has three basic elements: k-value selection, distance measurement, and classification decision rules [16], which need to be designed according to specific problems. In order to find the most similar process type and inherit its scheduling knowledge, there is only one most similar process, and no majority vote is required. Therefore, the k value can be designed as 1. The distance measurement is mainly used to calculate the similarity between the feature vectors of two instances. It is the key to measure the similarity between the process of the emergency trial-production order and other history processes. Common distance measurement methods include the European distance, cosine similarity, and the Manhattan distance, as shown in Table 1. In this research, the Euclidean distance is used as the distance measure. The smaller the absolute distance is, the greater the degree of similarity is. In the aspect of operation feature vector, the four features of operation type, processing time, previous operation type, and subsequent operation type are selected. The first condition that two operations are similar is that they are of the same type, so they have the same processing machines. The type of previous operation and subsequent operation restricts the type of processing machine before and after the operation and has a great impact on the selection of processing route of the current operation.

By calculating the Euclidean distance between the process of trial-production parts and the history process in the database, the most similar history process is obtained and its related scheduling knowledge is inherited to guide the scheduling of the process of trial-production parts. The similarity calculation method is shown in Figure 4. After finding the most similar process, the process of trial-production parts can directly inherit the scheduling knowledge of the history process, and the problem of emerge orders of the trial-production parts can be transformed into the problem of emergency orders of ordinary parts, which can be solved by the disturbance processing strategy of inserting orders of ordinary parts.

3. Proposed Methodology

In the actual production process, the suitable scheduling rules are constantly changing at different times in the face of different environment states and different jobs. Single rule-based scheduling methods usually ignore the impact of environmental changes on the optimization effect of scheduling rules, resulting in poor scheduling results. To solve this problem, the scheduling decision-making process is divided into two stages: machine selection and buffer job sequencing. A scheduling decision model is established based on contextual bands (CBs) in reinforcement learning. Through continuous trial and error learning, each scheduler can select the best machine selection rules and buffer job sequencing rules according to the real-time state of the scheduling environment, so as to improve the adaptive ability of the algorithm in the face of environmental changes and the overall optimization effect.

3.1. Contextual Bandit

CB is a special reinforcement learning model, in which each round contains only one state and only affects immediate rewards [17]. The CB model can be described as {A, S, R}, where A is the action space, S is the state space, and R is the reward. As shown in Figure 5, in each episode e, the scheduler selects an optimal production action ae according to the state Se of workshop environment and then obtains the reward re. The reward re is a variable related to both the environmental state Se and the action ae [18], which also means that the changing environmental state is quantified as context information to help make decisions in context-sensitive, highly dynamic, and complex systems [19]. The goal of the scheduler is to obtain the mapping function from the current environmental characteristics to the optimal production action through continuous trial and error, so as to maximize the cumulative reward value of each episode.

3.2. Connecting FJSP with CB

In the mixed-line production job shop, the machine selection stage and the buffer job sequencing stage are both regarded as a single scheduling step, and each time, only the process of the current release of the job is scheduled. The next operation is released and scheduled only when the current operation is finished. This single-step feature reduces the complexity of the entire job shop scheduling problem and enables the job shop to use real-time information for scheduling, thereby greatly enhancing the anti-interference ability of the overall job shop. Similarly, the single-step feature also exists in CB. CB only makes decisions on the next action and updates the decision strategy through the rewards of this step.

The connection between CB and FJSP is shown in Table 2. During the decision-making process of the mixed-line job shop, the job can be regarded as the main agent in CB. The scheduling process is similar to the process in which the agent chooses action according to the environment state in the CB, and the final reward can be regarded as the optimal performance index. Then, through continuous trial and error training, the scheduling strategy can finally converge to the best state, and a better scheduling solution than the single scheduling rule method can be obtained.

3.3. Contextual Bandit Formulation for Decisional Process

The contextual bandit formulation for the decisional process is illustrated as follows.

3.3.1. State Space

The state space is composed of corresponding state characteristics of jobs and machines. When the job agent makes a scheduling decision, it will receive the state characteristic information in real time and use this information to support for decision-making. The state features selected in this paper include the total number of various processes scheduled at the same time, the total number of jobs in the buffer of each processing machine, the remaining processing time in the buffer of each processing machine, and the processing time of process required by each processing machine.

3.3.2. Action Space

The action space consists of a combination of machine selection rules and buffer job sequencing rules. Common scheduling rules in the machine selection stage include the shortest processing time (SPT) rule, the least queued element (LQE) rule, and the shortest queue (SQ) rule. The common scheduling rules in the buffer job sequencing stage include the first in first out (FIFO) rule, the shortest job first (SJF) rule, and the last in first out (LIFO) rule. Consider that the scheduling method based on single rule usually ignores the impact of state change on the optimization effect of scheduling rules, resulting in poor scheduling results. This research designs a set of scheduling rules, combines the above single scheduling rules into 9 combination rules at the machine selection and buffer job selection stages, and takes them as the action of action space, as shown in Figure 6. These 9 actions include SQ + FIFO, SQ + SJF, SQ + LIFO, LQE + FIFO, LQE + SJF, LQE + LIFO, SPT + FIFO, SPT + SJF, and SPT + LIFO.

3.3.3. Reward

The setting of reward is usually determined by the optimization goal of scheduling system. In this research, the makespan is taken as the scheduling performance index. Therefore, after completing a scheduling task, the average wait time (MWT) of all jobs can be calculated immediately and compared with the average wait time before decision-making. The corresponding reward is obtained by subtracting the current average wait time from the average wait time before decision-making. The calculation formula is as follows:where WTj,t is the remaining processing time of job j at time t and n is the total number of jobs in the current workshop.

To sum up, after concreting the state space, action space, and reward into actual objects, respectively, the decision-making model of mixed-line production scheduling is obtained, as shown in Figure 7. When perceiving that a new task needs to be scheduled, the job agent obtains the state of the workshop environment and triggers the scheduling event. The AI scheduler selects the best scheduling rule combination from the action space according to the perceived state. The machine agent executes the production activities according to the scheduling rules and feeds back the reward value of the execution result to the job agent.

3.4. Selection Policy

In this paper, a CB policy, namely, LinUCB, is used to achieve the best rule selection policy. LinUCB is a CB algorithm that uses a linear model to approximate the mapping relationship between the environmental state and the expected reward value of each action. In any round e, the state eigenvector of action a in the action space can be expressed by xe,a ∈ Rd, and then the expected reward value of each action can be calculated by the following formula:where re,a are the expected reward value of round e selection action a and is the linear programming parameter of action a.

The linear programming parameters of each action can be estimated according to historical decision experience. Suppose Ga ∈ Rm×d and ca ∈ Rm are the historical experience matrix of action a in turn e. Each row of matrix Ga and ca represents the previous state eigenvector input and the corresponding reward value, respectively. Note that then we can use the method of ridge regression [20] to estimate the linear programming parameters of action a:

In addition, in order to fully explore various actions, the LinUCB algorithm uses the confidence interval as the basis for selection and selects the action with the largest upper bound of the confidence interval in each decision. That is, in turn e, selectwhere and .

The detailed description of the LinUCB-based scheduling method is shown in Figure 8. The arms of LinUCB algorithm stand for the combined scheduling rules that can be selected at the scheduling point, and their features are the state information of the processes of the jobs to be scheduled and available machines in the mixed-line job shop environment. Among them, is the estimated return after performing the action a and is the width of the confidence interval obtained after performing the action a, where is a superparameter that controls the degree of exploration, which is in the experimental part of this research set to 0.25. It is beneficial for the scheduling agent to learn more scheduling knowledge in a dynamic workshop environment.

4. Experiments

4.1. Experimental Environment

With the mixed-line job shop of missile structural parts in Shanghai as the objective for study, the processing technology of missile structural parts is complex, the type and the number of processes included are numerous, and some structural parts are with the combined processing constraint. Moreover, the situation of coexistence of development production orders and batch production orders is widespread. The processing equipment information involved is shown in Table 3: there occurs 10 equipment, including 2 ordinary lathes, 2 ordinary milling machines, 2 CNC lathes, 2 CNC milling machines, and 2 groups of fitters.

In the experiment, the parameter setting of the LinUCB-based scheduling method for a mixed-line job shop is shown in Table 4. The dimension of features observed by the scheduling agent is thirty-one and is the sum of the total number of various scheduled processes, the total number of jobs in the buffer of each machine, the remaining processing time in the buffer of each machine, and the processing time of process required by each machine. The number of machines is ten. The dimension of action space is nine. The number of arms of contextual bandit is nine. The probability of exploration to choose a not optimal combined scheduling rule is set to 0.25. In order to make the scheduling agent converge, the number of training steps of the experiment is set to 800.

4.2. Experiment I

Experiment 1 is performed mainly to verify the feasibility and efficiency of the proposed real-time scheduling algorithm. A batch of 10 jobs is selected as the experimental example (details are shown in Figure 9 and Table 5). It includes two groups of jobs that need combined processing (body-gland and inner wing-outer wing) and six kinds of jobs that do not need combined processing (bottom plate, wall plate, cabin, gas hood, flange, and air rudder surface). In Table 5, the bracketed processes need combined processing, and the bracketed processes are their combined processing processes. The number in the table is the processing time, and “—” means that the process cannot be processed on this machine.

In order to verify the superior scheduling performance of the proposed algorithm, the obtained scheduling results of the proposed algorithm are compared with those of two of representative reinforcement learning algorithms (i.e., epsilon greedy and Q-learning algorithms). The epsilon greedy algorithm is a typical one of multiarmed bandit. The Q-learning algorithm is a typical one of value-based reinforcement learning.

The adaptive real-time scheduling method first learns 800 epochs, and the makespan per epoch is shown in Figure 10. It can be observed that the value of makespan gradually decreases with the continuous learning of the scheduling module and finally remains between 42 and 44. The Gantt chart of the scheduling results by using the proposed adaptive scheduling method is shown in Figure 11. It meets the combined processing constraint, as well as the routing and machining constraints, and proves the effectiveness of the proposed algorithm. Moreover, our method can achieve a better solution than the single dispatch rule method, as shown in Figure 12. The result of the proposed method has more than 10% improvement even compared to the best rule SPT + FIFO of the 9 single dispatching rules on this simulation experiment. Compared with epsilon greedy and Q-learning algorithms, the proposed algorithm can achieve 7% and 5% performance improvement in terms of completion time, as shown in Figure 12.

4.3. Experiment II

Experiment 2 is performed mainly to verify the feasibility and efficiency of the proposed algorithm under emergency trial-production orders. It was assumed that, at time 20, a trial-production order was inserted into the job shop, including the new trial parts (inner wing, outer wing, cabin, and air gas hood). The details are shown in Table 6.

The k-NN algorithm is used to find the most similar job and inherit the history scheduling knowledge. Table 7 shows the most similar process information found for the trial-production parts according to the proposed disturbance processing strategy. By inheriting the scheduling knowledge of these similar processes, the trial-production parts guide their own process scheduling and optimize the corresponding performance indexes. It can be seen from the table that the distance between the trial-production part and the most similar batch production part is very close, and it has the conditions for scheduling knowledge reuse.

The Gantt chart of the scheduling results obtained under the disturbance processing strategy is shown in Figure 13. The value of makespan is 87, where the process with “dev” is the operation of emergency trial-production order and the rest are normal scheduling operations. Under the disturbance processing method in this research, the emergency trial-production order is successfully scheduled, which meets all the constraints of the mixed-line production job shop and proves the effectiveness of the disturbance processing strategy and the prototype system.

The proposed method is compared with some common scheduling rules that have been widely used in the FJSP. As shown in Figure 14, the result obtained by the proposed method is better than the others which are using common single scheduling rules. Compared with the best result of single scheduling rule, the makespan is improved by 5.7%, which proves the efficiency of the proposed method in this research. When encountering the insertion of urgent order, the proposed algorithm can achieve 5% and 2% performance improvement in terms of completion time compared with epsilon greedy and Q-learning algorithms, as shown in Figure 14.

5. Conclusion and Future Research

This paper presents an adaptive real-time scheduling method for the mixed-line job shop scheduling problem with combined processing constraints. By introducing the definition of virtual operation, the problem caused by combined processing constraints is successfully simplified and transformed into a classical flexible job shop scheduling problem. According to the situation of the coexistence of trial-production and batch production, the disturbance processing mechanism is established in this research. Considering that emergency trial-production orders have a great impact on the original production plan of batch orders, the k-nearest neighbor method is used to find the history operation which is the most similar to trial-production parts. To overcome the shortcoming of poor performance of traditional single dispatching rule strategy, the scheduling decision-making process is divided into the machine selection stage and buffer job sequencing stage. And a scheduling decision model is established based on contextual bands (CBs) in reinforcement learning. Through continuous trial and error learning, each scheduler can select the best machine selection rules and buffer job sequencing rules according to the real-time state of the scheduling environment. This approach significantly enhances the adaptability and the performance of the scheduling algorithm. Finally, the proposed methodology is evaluated and validated with experiments in a smart manufacturing setting. We benchmark the performance of the scheduling algorithm with scheduling methods based on a variety of single dispatch rules. The experimental results show that the proposed algorithm not only improves the performance in the mixed production scheduling problem but also effectively copes with emergency development orders.

Future research will focus on reward-scheduling mechanisms, and distributed learning with the scheduler resides in each manufacturing thing. This paper makes an attempt to improve the AI for mixed-line production. We hope that this work will help stimulate more in-depth investigation and multidisciplinary research and promote the application of artificial intelligence technology in intelligent manufacturing.

Data Availability

Raw data were generated at Navicat 15 for MySQL. Derived data supporting the findings of this study are available from the corresponding author on request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Key Research and Development Program of China (grant no. 2018YFB0177000) and the Fundamental Research Funds for the Central Universities (grant no. NT2021021).