Abstract
The job shop scheduling problem (JSSP) is a notoriously difficult problem in combinatorial optimization. In terms of the objective function, most existing research has been focused on the makespan criterion. However, in contemporary manufacturing systems, due-date-related performances are more important because they are essential for maintaining a high service reputation. Therefore, in this study we aim at minimizing the total weighted tardiness in JSSP. Considering the high complexity, a hybrid differential evolution (DE) algorithm is proposed for the problem. To enhance the overall search efficiency, a neighborhood property of the problem is discovered, and then a tree search procedure is designed and embedded into the DE framework. According to the extensive computational experiments, the proposed approach is efficient in solving the job shop scheduling problem with total weighted tardiness objective.
1. Introduction
The job shop scheduling problem (JSSP) has been known as a very stubborn combinatorial optimization problem since the 1950s. In terms of computational complexity, JSSP is -hard in the strong sense [1]. Therefore, even for very small JSSP instances, it is by no means easy to guarantee the optimal solution. In recent years, the metaheuristics—such as genetic algorithm (GA) [2, 3], tabu search (TS) [4, 5], particle swarm optimization (PSO) [6, 7], and ant colony optimization (ACO) [8, 9]—have clearly become the research focus in practical optimization methods for solving JSSPs.
Remarkably, most recent research concentrates on the hybridization of two or more heuristics to solve JSSP. For example, ACO is hybridized with TS in [10]; GA is hybridized with VND (variable neighborhood decent) in [11]; a hybrid algorithm based on PSO, VNS (variable neighborhood search), and SS (scatter search) is proposed in [12] for solving a biobjective JSSP. The underlying motivation for constructing hybrid algorithms is that different search mechanisms and neighborhood structures are usually complementary to each other (i.e., when one fails, the other may be effective). Therefore, a combination of several optimizers is likely to promote the probability of finding optimal solutions. A common strategy is to hybridize an algorithm good at large-scale exploration (diversification) with an algorithm good at fine-scale exploitation (intensification), which provides reliable searching ability for tackling complex solution spaces.
In a general JSSP instance, a set of jobs are to be processed on a set of machines under the following basic assumptions.(i)There is no machine breakdown.(ii)No preemption of operations is allowed.(iii)The transportation time and the setup time can be neglected.(iv)Each machine can process at most one job at a time.(v)Each job may be processed by at most one machine at a time.
Each job has a fixed processing route which traverses all the machines in a predetermined order, and the manufacturing process of a job on a machine is called an operation. The duration time of each operation is fixed and known. Besides, a preset due date and a preset weight are given for each job. Due date is the preferred latest finishing time of a job, so completion after this specific time will result in losses such as a worsened reputation in customers. Weights reflect the importance level of the orders from different customers, larger values suggesting higher strategic importance. If we use to denote the completion time of job , the objective function of JSSP can be makespan (), maximum lateness (), total weighted tardiness (, where ), number of tardy jobs (, where if and otherwise), and so forth.
Up till now, most research on JSSP has been focused on the makespan criterion. However, due-date-related performances are becoming more significant in the make-to-order manufacturing environment nowadays. In this sense, the total weighted tardiness measure better reflects the critical factors that affect the profits of a firm. Meanwhile, from the theoretical perspective, total weighted tardiness is more difficult to optimize than makespan. According to the concepts of computational complexity, minimizing is only a special case of minimizing TWT, which means the complexity of solving JSSP with total weighted tardiness objective is much greater than that of solving JSSP with makespan objective. Below we will use the abbreviation “TWT-JSSP” to denote the job shop scheduling problem with total weighted tardiness objective.
The rest of this paper is organized as follows. Section 2 provides a brief review on existing solution methods for TWT-JSSP and the differential evolution algorithm. Section 3 discusses the mathematical model of TWT-JSSP and a neighborhood property. Section 4 describes the design of a hybrid differential evolution algorithm for solving TWT-JSSP. Section 5 presents the computational results. Finally, Section 6 concludes the paper.
2. Literature Review
2.1. Existing Algorithms for TWT-JSSP
The contributions on TWT-JSSP are relatively rare in the literature. The only exact solution methodology is the branch-and-bound algorithm proposed by Singer and Pinedo [13], while all the following surveyed methods belong to the heuristic category.
Reference [14] presents efficient dispatching rules for sequencing the operations in TWT-JSSP, the most powerful one being the ATC (apparent tardiness cost) rule. In [15, 16], the authors propose modified shifting bottleneck heuristics, in which the subproblems are solved by dispatching rules (the basic ATC rule or the BATCS rule for complex job shops). The large step random walk (LSRW) algorithm is designed in [17] and aimed at the TWT-JSSP with release times (). References [2, 3] present hybrid genetic algorithms for TWT-JSSP. Reference [18] presents a rolling horizon approach for TWT-JSSP, in which each time window is scheduled with a modified shifting bottleneck heuristic. Reference [19] presents a tabu search algorithm for the generalized TWT-JSSP with release times and precedence constraints. Recently, a new meta-heuristic called electromagnetic algorithm (EM) [20] has been explored and tested on TWT-JSSP.
Although these algorithms are effective, they have not fully utilized the inherent properties of TWT-JSSP. Hence, they will not perform satisfactorily when faced with large-scale instances of the problem. Exploring the structural properties (including neighborhood properties) of TWT-JSSP is an important research direction, which also constitutes the motivation of this study.
2.2. The Differential Evolution Algorithm
The differential evolution (DE) algorithm, which was first proposed by Storn and Price [21] in the mid-1990s, is a relatively new evolutionary optimizer. Characterized by a novel mutation operator, the algorithm has been found to be a powerful tool for continuous function optimization [22]. Due to its easy implementation, quick convergence and robustness, the DE algorithm is becoming increasingly popular in recent years. A wide range of successful applications have been reported, such as the design of fixed-structure robust controllers [23], space trajectory optimization [24], multiarea economic dispatch [25], and exergoeconomic analysis and optimization [26].
Despite the low computational complexity, DE has also been shown to have some weaknesses. In particular, DE is good at exploring the search space and locating the promising region, but it is slow at exploiting the high-quality solutions [27]. Recently, some researchers try to improve the performance of DE by hybridizing it with other local search-based algorithms. In [28], the authors propose the 2-Opt based DE (2-Opt DE) which is inspired by 2-Opt algorithms to accelerate DE. They show that 2-Opt DE can outperform the original DE in terms of solution accuracy and convergence speed. In [29], the authors incorporate the orthogonal design method into DE to accelerate its convergence rate. They show that the proposed approach outperforms the classical DE in terms of the quality, speed, and stability of the final solutions.
Due to its continuous feature, the traditional DE algorithm cannot be directly applied to scheduling problems with inherent discrete nature. Indeed, in canonical DE, each solution is represented by a vector of floating-point numbers. But for scheduling problems, each solution is a permutation of integers. To address this issue, two kinds of approaches can be identified in the literature.(1)A transformation scheme is established to convert permutations into real numbers and vice versa [30, 31]. In this way, we only need to add a few lines to the encoding and decoding procedures, and it is not necessary to change the implementation of DE itself. The advantage is that the search mechanism of DE is well preserved, while the disadvantage is the redundancy in the mapping from real to permutation spaces.(2)The mutation and crossover operators in DE are modified to suit the permutation representation [32, 33]. Clearly, the difficulty is how to redesign these operators such that the characteristics of high-quality solutions can be inherited and exploited. The design of operators should be problem dependent and thus requires a specific analysis of the optimization problem. Therefore, the lack of generality is a disadvantage of this approach.
Despite the success on permutation flow shop scheduling problems, the application of DE to JSSP has rarely been reported. The job shop model is more complex because the processing sequences of each machine need to be optimized separately. To tackle such a difficult problem, we design a tree-based local search module which can enhance the exploitation ability of DE. To our knowledge, this is the first attempt that DE is applied to TWT-JSSP.
3. The Mathematical Model and Neighborhood Properties of TWT-JSSP
3.1. The Mathematical Model and Its Duality
We utilize the concept of disjunctive graph for formulating TWT-JSSP [34]. In the graph , is the set of nodes, where corresponds to the operation set of the JSSP instance. Node 0 stands for a dummy operation which starts before all the real operations, while the starting time and the processing time of this dummy operation are both zero. is the set of conjunctive arcs, and each conjunctive arc indicates the processing order of two operations belonging to the same job. Meanwhile, node 0 is connected with the first operation of each job with a separate conjunctive arc. In other words, if we use to denote the set of the first operations of all jobs, then for all , . is the set of disjunctive arcs, where represents the disjunctive arcs related with machine . Each disjunctive arc connects two operations that should be processed by the same machine, but the processing order of the two operations is yet to be determined.
Under the disjunctive graph representation, TWT-JSSP can be described as a mixed-integer linear disjunctive programming model:
In the above formulation, denotes the starting time of operation , and the decision variable is a vector consisting of the starting times of all the operations. denotes the required processing time of operation . For the dummy operation 0, we set . represents the last operation of job , and thus denotes the set of ultimate operations of all the jobs. For any operation , and , respectively, denote the due date and the weight of the job to which operation belongs. For the ultimate operation of job , is the tardiness of job .
From the perspective of disjunctive graphs, finding a feasible solution to JSSP is equivalent to determining the directions of all the disjunctive arcs such that, for any , only one of the two disjunctive inequalities in constraint (b) is satisfied. Now, let us suppose the directions of all disjunctive arcs have been determined, and we use to denote the set of directed disjunctive arcs. In this situation, the problem (3.1) becomes a linear programming model, that is,
As suggested by [19], the dual problem of the linear program (3.2) is a maximum cost flow problem:
In the above formulation, represents the flow over the arc ; is a newly defined arc set, each arc pointing from the last operation of job to node 0; denotes the total cost of the flows.
Theorem 3.1. For the maximum cost flow problem (3.3), there exists an optimal solution which satisfies the following: each node (except node 0) has at most one incoming arc with nonzero flow.
Proof. See Appendix A.
3.2. A Neighborhood Property for the Swap of Adjacent Operations
We know from the previous subsection that, given a feasible (i.e., the directions of all disjunctive arcs), the schedule is determined, and meanwhile, a network flow graph is associated with the current schedule. The task of neighborhood search is to modify certain parts of to get , so that the total weighted tardiness can be reduced, that is, (or equivalently, ).
Definition 3.2 (block). A sequence of operations in a critical path is called a block if (3.1) it contains at least two operations and (3.2) the sequence includes a maximum number of operations that are consecutively processed by the same machine.
If we want to improve the schedule under the current , we should only consider modifying the disjunctive arcs that satisfy the following two conditions: (1) the arc belongs to a certain block and (2) the arc carries positive flow in the dual network. The latter condition implies that this disjunctive arc is on the critical path of at least one tardy job, so altering this arc can possibly reduce the TWT.
Now we consider whether swapping two adjacent operations in a block can really improve the TWT. Under a given , suppose the operations constitute a block in the corresponding schedule, and the associated network flows are partially shown in Figure 1. The amount of flow on the outgoing disjunctive arc of operation is denoted by and suppose , for all . We assume this dual solution satisfies the condition described in Theorem 3.1, that is, each node can have at most one incoming arc with positive flow. In this case, the input flows of these nodes all come from the input disjunctive arcs, so the input conjunctive arcs of these nodes must carry zero flow and thus they are not marked in the figure. However, each of the nodes can still have an outgoing arc with positive flow. For simplicity, these outgoing arcs with possibly positive flow are drawn in solid arrows in the figure, and the amount of flow is denoted by , respectively.

After executing the SWAP operator, we can construct a new feasible solution to the dual problem by adjusting the amount of flows on each arc. In the new network, the flow on the outgoing disjunctive arc of operation is denoted by . In order to enable accurate analysis, we keep all the values constant in the process of adjusting the local flows within the block (so that the flow equilibrium outside the considered scope will not be affected).
Theorem 3.3. Suppose a block with consecutive positive flows contains the following operations: . If the condition is satisfied, then swapping the two operations and will not lead to improvement on the objective function (TWT).
Proof. See Appendix B.
Therefore, the function of Theorem 3.3 is that it helps to exclude some nonimproving moves in the local search process, so that the optimization efficiency can be improved. However, such a greedy mechanism must be combined with a large-scale randomized search (like DE) in order not to get trapped by local optima.
4. The Hybrid DE Algorithm for TWT-JSSP
4.1. The DE Optimization Framework
Like other evolutionary optimizers, DE is a population-based stochastic global optimizer. In DE, each individual in the population is represented by a -dimensional real vector , , where SN is the population size. In each iteration, DE employs the mutation and crossover operators to generate new candidate solutions and then applies a one-to-one selection policy to determine whether the offspring or the parent can survive to the next generation. This process is repeated until a preset termination criterion is met.
The standard DE algorithm can be described as follows.
Step 1 (Initialization). Randomly generate a population of solutions, .
Step 2 (Mutation). For , generate a mutant solution as follows: where denotes the best solution in the current population; and are randomly selected from such that ; is a weighting factor.
Step 3 (Crossover). For , generate a trial solution as follows: where is an index randomly selected from to guarantee that at least one dimension of the trial solution differs from its parent ; is a random number generated from the uniform distribution ; is the crossover parameter.
Step 4 (Selection). If is better than , let .
Step 5. If the termination condition is not satisfied, go back to Step 2.
According to the algorithm description, DE has three important parameters, that is, , , and . In order to ensure a good performance of DE, the setting of these parameters should be reasonably adjusted based on specific optimization problems.
It is worth noting that there exist other variants of the DE algorithm with respect to the mutation and crossover method [35]. In fact, the above procedure is noted as “DE/best/1/bin” in the literature. If we change the mutation policy, we can obtain the “DE/rand/1/*’’ variant: where the base solution () is also randomly chosen from the population.
4.2. The Encoding and Decoding Schemes
The encoding scheme used here is based on the random key representation and the smallest position value (SPV) rule. Each solution is described by continuous values, and in the decoding process, this set of values will be transformed to a permutation of operations by the SPV rule.
Formally, let denote the th solution, where is the position value of the th solution with respect to the th dimension (). To decode a solution, the SPV rule is applied to sort the position values within a solution and then determine the relative positions of the corresponding operations. This process is exemplified in Table 1 for a problem containing 6 operations (suppose ). In this example, the smallest position value (−0.99) resides in the second dimension, and thus, the operation “2” comes first in the resulting sequence (the third row of the table). After sorting all these dimension values constituting solution , the operation sequence can be obtained.
Then, the decoded operation sequence can be used to build an active schedule for TWT-JSSP. The detailed schedule building algorithm is described as follows.
Input. An operation sequence .
Step 1. Let (the set of all operations), (the set of first operations of each job). Set .
Step 2. Find the operation , and let be the index of the machine on which this operation should be processed. Use to denote all the operations from which should be processed on machine .
Step 3. Delete from the operations that satisfy .
Step 4. Find the operation which belongs to and meanwhile ranks first in . Schedule operation on machine at the earliest possible time.
Step 5. Let , , where is the immediate job successor of operation .
Step 6. If , set and go to Step 2. Otherwise, the decoding procedure is terminated.
In the above description, the release time equals the completion time of the immediate job predecessor of operation . So is the earliest possible completion time of operation . represents the set of operations yet to be scheduled at iteration , while represents the set of ready operations (whose job predecessors have all been scheduled) at iteration .
4.3. The Local Search Module Based on Tree Search
As mentioned in Section 2.2, DE alone does not have satisfactory “exploitation” ability. In order to cope with complex search spaces, it is usually required that a local search module be embedded into the general framework of DE. In this paper, we devise a tree-based local search optimizer by borrowing ideas from the filter-and-fan algorithm in [36]. In each iteration of DE, the local search is carried out for the best of solutions in the current population immediately after the selection phase. Thus, is an important parameter for adjusting the frequency of local search and achieving a balance between exploration and exploitation.
In order to improve a selected solution, the proposed tree search algorithm generates compound neighborhoods for the solution based on the SWAP operator. The algorithm searches in a breadth-first manner, but unlike the beam search heuristic, each tree node represents a complete schedule rather than a partial schedule. The tree is gradually expanded by applying the SWAP operator iteratively. In each trial, the pair of operations to be swapped is randomly chosen from the critical blocks in the current schedule. To avoid repeated search, the reverse of the previous SWAP operator is prohibited. For example, if the current schedule is obtained by swapping , then a swap on (if it is in the critical block again) should be tabooed in the immediate expansion from the current node. In the search process, we can utilize the relevant neighborhood property (Theorem 3.3) to promote the efficiency. In other words, when the theorem predicts that a certain move will not lead to improvement, then the action is canceled and another neighborhood move will be tried.
The proposed tree search algorithm is heuristic in nature, because it is computationally infeasible to enumerate all the possible swaps of the critical operations. Indeed, the algorithm only makes trails for each solution other than the “root” solution. Meanwhile, we need to apply a pruning strategy in the breadth-first search process in order to control the computational time. In particular, only the best solutions on each level of the tree will be exploited in the subsequent trials. We implement the tree search algorithm using the queue data structure as follows.
Input. A selected base solution .
Step 1. Let . Create an empty queue.
Step 2. Try applying the SWAP operator on different locations in the critical blocks of . Let the produced solutions enter the queue. Denote the best solution among these by .
Step 3. Perform the following steps for times. (3.1)Take out the first solution in the queue, denoted by .(3.2)Try applying the SWAP operator on different locations in the critical blocks of . Let the produced solutions enter the queue.
Step 4. Retain the best solutions in the queue, and delete the rest ones. Denote the currently best solution in the queue as . If , let .
Step 5. Let . If , go back to Step 3.
Step 6. Output .
According the above description, the tree has nodes on level 1, each of which will be expanded to nodes on level 2. Thus, there are totally nodes on level 2. However, only the best nodes among these have a chance to be exploited and further expanded to level 3, while the rest will be abandoned. In this way, the number of nodes being considered on each level is controlled at and will not increase exponentially. After expanding to levels, the algorithm is terminated.
For example, if we set , , and , the entire tree structure may look like Figure 2. The starting solution is denoted by . On each level from , three promising solutions are expanded further while the other three are discarded. The best solution found by this local search endeavor may appear on the last level.

The complexity of the tree search algorithm can be briefly analyzed as follows. There are roughly solutions that need to be expanded in the tree search process. For each expansion, the algorithm should first find the critical paths related with the tardy jobs. According to the Bellman’s algorithm [37], this can be done in time. Then, the neighborhood operator has to be applied for approximately times in the tree search process. Therefore, the computational complexity of the algorithm can be described as .
Compared with the extensive exploration behavior of DE, the proposed tree search algorithm works in a greedy manner. First, each neighborhood move is performed on the critical blocks of the corresponding schedule, because only such moves are possible to produce improvements. Second, the neighborhood property is utilized to exclude some unpromising moves when there are more than candidates to choose from. Third, on each level of the tree, only best solutions from the totally will be further considered. These features make the tree search algorithm extremely concentrated, which provides a complementary mechanism to DE’s search process. Thus, using the tree search as an embedded local search module is beneficial for enhancing the exploitation capability and the overall performance of the hybrid DE.
5. The Computational Results
5.1. The Test Problems and Parameter Setting
In order to test the performance of the proposed hybrid DE (abbreviated as HDE hereinafter), randomly generated TWT-JSSP instances with different sizes are used in the computational experiment. For a specific problem size, the processing route of each job is a random permutation of the machines, and the required processing time of each operation follows a uniform distribution and takes only integer values. The due date of each job is set based on the total processing time of the job as . In this expression, is a random number uniformly distributed in the interval , and denotes the set of operations that constitute job . is a coefficient that reflects the tightness level of the due date setting. The weight of each job (integer values) follows a uniform distribution, that is, .
In order to reduce random errors in the computation, we randomly generate 5 different instances for each due date tightness and scale. In particular, 10 problem sizes (from 100 operations to 500 operations) and 3 due date levels ( for loose due dates, , for moderate due dates and for tight due dates) are tested, so the total number of instances is 150. The first two columns of Tables 2, 3 and 4 list the scales of the 10 instance sets. We compare the performance of the proposed HDE with the hybrid genetic algorithm GLS for TWT-JSSP [2]. The algorithms have been implemented using Visual C++ 2010 and tested on a platform of Intel Core i5-750 2.67 GHz, 3 GB RAM, and Windows 7.
In this experiment, the parameters of HDE are set as follows. (i)The DE parameters: , (the “dither” strategy [38]), .(ii)The local search parameters: , , , .(iii)Termination criterion: the best-so-far solution has not been updated for 50 generations (or controlled by the external computational time limit).
5.2. The Results and Discussions
The computational results are processed in the following way before listed in Tables 2, 3 and 4. For each instance , HDE and GLS are, respectively, run for 10 independent times. The best objective value obtained in the 10 runs by algorithm () is denoted by , the worst denoted by , and the mean denoted by .
Next, we can calculate the relative objective values by taking as reference: , , .
Finally, when the above steps have been performed for each instance in the considered instance set, we calculate the average relative values (over this set) as , , and . In this way, the computational results are summarized in the tables with respect to each instance set.
Meanwhile, with respect to the first instance of each set under (marked with “#-1” where # represents the index of instance sets), we record the average computational time (over 10 independent runs) of the two algorithms and plot the data as bar graphs in Figure 3.

The following comments can be made according to the results displayed in Tables 2–4 and Figure 3.(1)For most instances, the gap between the best and the worst solutions obtained by HDE is smaller than the gap obtained by GLS, which implies that HDE performs more robustly in different executions and for different scheduling instances.(2)When the due dates are tighter ( or 1.1), we find that the solution quality of GLS is considerably inferior to that of HDE. This is because under tight due dates, many jobs are prone to be tardy, and the optimization difficulty increases systematically. In this situation, the proposed tree search mechanism exhibits greater advantage. The search is more effective because the neighborhood moves are conducted on the critical blocks and a part of unpromising moves are eliminated with simple calculations. To certain extent, HDE has overcome the blindness of traditional local search, and thus it can access the high-quality solutions with larger probability.(3)By comparing the computational time, we see that, for most instances, the consumed time of HDE is less than that of GLS. More remarkably, the increasing speed of computational time with instance size is slower on the part of HDE. This is because the mutation and crossover based on random key representation in HDE is more efficient than the crossover and mutation based on operation sequence representation in GLS. Meanwhile, the phenomenon also verifies the fact that the tree-based local search is able to accelerate the convergence of DE (note that the termination criterion of HDE is decided by the convergence status).
5.3. Influence of the Parameter Settings
The following experiments are designed for observing the impact of parameter settings on the final solution quality of HDE. A instance with is used in these experiments. The termination criterion adopted here is an exogenously given computational time limit. When one parameter is being tested, the other parameters are all fixed at their recommended values given in Section 5.1.
The population size () is one of the key parameters for the HDE algorithm. If is large, many solutions need to be maintained in the population, which increases the computational burden of solution decoding and evaluation. If is small, more generations can be implemented within the DE framework, but the limited population diversity will impair the effectiveness of mutation and crossover. Therefore, under a given computational time limit, the selection of can influence the overall optimization performance. In this experiment, we fix the available computational time at three different levels and then identify the most suitable value of in each scenario. The time limit is set as 20 sec (tight), 40 sec (moderate), and 60 sec (loose), respectively. The computational results are displayed in Figure 4, where the vertical axis gives the average objective value obtained from 20 independent executions of the proposed HDE under each .

According to the results, the selection of has a noticeable impact on the solution quality. Generally, the solution quality will deteriorate if is either too small or too large. But the best varies with different time constraints. If the time budget is tight or moderate, the best population size is . If the time resource is abundant, the best population size is . Thus, setting is reasonable for ordinary uses.
Another important parameter for HDE is , the percentage of solutions that undergo the local search process. A reasonable selection of will result in an effective balance between exploration and exploitation. The computational results for this experiment are displayed in Figure 5.

According to the results, the setting of has a considerable impact on the solution quality, especially when the computational time is scarce (20 sec). A small means that only a few solutions in each generation can be improved by the local search, which has little effect on the entire population. A large suggests that too much time is consumed on local search, which may reduce the normal function of DE.
The best setting of under each constraint level is 70 (for tight time budget), 60 (for moderate time budget) and 40 (for loose time budget). When the exogenous restriction on computational time is tight, DE has to rely on frequent local search to find good solutions. This is because, in the short term, the tree-based local search is more efficient than DE’s mechanism (mutation and crossover) in improving a solution. However, the price to pay is possibly a premature convergence of the optimization process due to the greedy nature of the tree search. On the other hand, when the computational time is sufficient, DE will prefer a larger number of generations to conduct a systematic exploration of the solution space. In this case, the local search need not be used very frequently, otherwise the steady searching process may be disturbed. Overall, a recommended value for is 50.
Finally, we focus on the local search parameters , , and . Following the suggestion of [36], we fix the relationship between and as and then test the influence of and on the final solution quality. The tested ranges are , , with a step size of 2, which leads to 81 combinations. Again, we choose the first instance of each set (#-1) under for this experiment. The computational time limit is set as 0.6 nm sec for an instance. The results are displayed as relative values in Figure 6.

The following comments can be made according to the results. (1)The tree search parameters produce a remarkable effect on the final solutions of HDE, which confirms that such a local search optimizer is effective in improving the performance of DE. When and both take small values, the tree search process does not have chance to examine a sufficient number of neighborhood solutions before termination, which results in solution quality decline. On the other hand, when and are both set large, the overall performance also worsens because the exploitation is consuming too much time and crowds out the exploration process of DE.(2)The value of (representing the depth of the search tree) and the value of (representing the width of the search tree) should be coordinated in order to guarantee satisfactory solution quality. When the tree is growing too wide (resp., deep) but not sufficiently deep (resp., wide), the overall effectiveness will deteriorate. Based on the experiments, the recommended settings are and (thus ).
6. Conclusion
In this paper, we propose a hybrid differential evolution algorithm for the job shop scheduling problem with the objective of minimizing total weighted tardiness. Based on the mathematical programming models, we show a neighborhood property for the swap of adjacent operations in a critical block, which can be used to exclude some nonimproving neighborhood moves. Then, a tree-based local optimizer is designed and embedded into the DE algorithm in order to promote the exploitation function. The tree search algorithm generates compound neighborhood for the selected solution and therefore it helps DE to exploit the relatively high-quality solutions. Finally, the computational results verify the effectiveness and efficiency of the proposed hybrid approach.
The future research can be carried out from the following aspects.(1)It is worthwhile to investigate the new and more effective neighborhood properties for TWT-JSSP. This could provide a deeper insight into the inherent nature of TWT-JSSP and facilitate the design of metaheuristics like DE.(2)It is worthwhile to consider other encoding schemes and mutation/crossover mechanisms of the DE algorithm (such as the discrete DE), which may be more suitable for the application of the neighborhood properties.
Appendices
A. Proof of Theorem 3.1
Proof. We prove the theorem by construction.
First, it is noticeable that the unit cost of each arc in  is equal to the processing time of the operation that is connected to the tail of the arc, that is, , which means the largest cost path from node 0 to node  is exactly the same as the critical path from operation 0 to operation , and the total cost of this path equals the length of the critical path.
Then, because constraint  requires that the total input flow should equal the total output flow for every node, a feasible solution  to this problem must be composed of several cycle flows. However, we know that  does not contain any cycles, so each cycle in  must contain at least one arc from . On the other hand, we can see that the arcs in  all point to the same node 0, and because a cycle cannot include duplicate nodes, it is clear that each cycle in  contains at most one arc from . Finally, it is concluded that each cycle contains exactly one arc from .
Based on the previous discussions, we can construct an optimal solution  by the following steps.Step 1. Initialize the flow on each arc to be 0, and let .Step 2. Find the unique critical path from operation 0 to , denoted by  (the uniqueness is guaranteed by a simple rule detailed in the following).Step 3. If the total unit cost (i.e., length) of the path  plus the unit cost of arc  is greater than 0, then add a flow amount of  to arc  and each arc in  (notice that the capacity limit for arc  is ).Step 4. Set . If , then return to Step 2; otherwise, terminate the algorithm.
In such a constructed solution , each node except 0 has at most one positive-flow incoming arc. This is because the flows are only distributed on the arcs belonging to critical paths, and meanwhile, only one critical path is considered from node 0 to any other node .
The last issue is how to maintain the uniqueness of the critical path to a certain node. Here we use a rule called “machine predecessor first.” Indeed, when looking for a critical path from 0 to , we begin from  and move backward. In each step, we must select an operation whose completion time equals the starting time of the current operation  from its immediate job predecessor  and its immediate machine predecessor . Then, if  and  both hold, the rule requires that the machine predecessor should be selected as the next current operation. This tie-breaking rule guarantees the uniqueness of the critical path to any node.
B. Proof of Theorem 3.3
Proof. As Figure 1 shows, the flows in the initial network are denoted by  (), so we have the flow equilibrium condition:
									
Now we are swapping the operations  and . After the SWAP operation is performed, the flows on the relevant arcs are denoted by . So the following equilibrium equations must be satisfied:
									
It is noticed that, in order to keep the equilibrium of the remaining nodes  () and  (), we have  and .
Solving  (B.2) together with (B.1) yields
									
								We can ensure , because Theorem 3.1 implies .
The difference in the total cost of the flows  and the original flows  is
									
Since the flows outside this block are all kept unchanged, the difference in the total cost of the whole network resulted from executing SWAP is the same as  calculated for this subnetwork, that is, .
Therefore, if , then  ( denotes the original set of directed disjunctive arcs while  denotes the new arc set obtained after performing SWAP). The last “=” is because we assume the original network is related with the optimal solution to the dual problem under . In fact,  and . So it concludes that, when  (), swapping  and  will not improve the current solution ().
Acknowledgment
This work is supported by the National Natural Science Foundation of China under Grant nos. 61104176, 60874071.