Abstract

The Traveling Salesman Problem (TSP) is an important routing problem within the transportation industry. However, finding optimal solutions for this problem is not easy due to its computational complexity. In this work, a novel operator based on dynamic reduction-expansion of minimum distance is presented as an initial population strategy to improve the search mechanisms of Genetic Algorithms (GA) for the TSP. This operator, termed as , consists of four stages: (a) clustering to identify candidate supply/demand locations to be reduced, (b) coding of clustered and nonclustered locations to obtain the set of reduced locations, (c) sequencing of minimum distances for the set of reduced locations (nearest neighbor strategy), and (d) decoding (expansion) of the reduced set of locations. Experiments performed on TSP instances with more than 150 nodes provided evidence that can improve convergence of the GA and provide more suitable solutions than other approaches focused on the GA’s initial population.

1. Introduction

As defined by [1], routing is the process of selecting “best” routes in a graph , where is a node set and is an arc set. Within this context, route planning is the calculation of the most effective route (route of minimum distance, cost, or travel time) from an origin to a destination node on a network, and the Traveling Salesman Problem (TSP) is one of the most studied and applied routing models in the transportation, manufacturing, and logistic industries [2]. As presented by [3] the TSP “is the fundamental problem in the fields of computer science, engineering, operations research, discrete mathematics, graph theory, and so forth”. This is the reason why the TSP has frequently been considered a touchstone for new strategies and algorithms to solve combinatorial optimization problems as commented by [2].

The TSP can be modeled as an undirected weighted graph where locations (i.e., nodes) are the graph’s vertexes, paths are the graph’s edges (i.e., arcs), and the path’s distance, cost, or time is the edge’s length [4]. Then, the objective of solving the TSP consists on minimizing the total distance of a complete sequence of paths (total route) which starts and finishes at a specific vertex (i.e., depot node) after having visited all vertexes once and only once. Figure 1 presents a solution example for the TSP which is also known as a Hamiltonian Circuit of minimum cost.

Finding optimal solutions for the TSP is a challenging task due to its computational complexity which is defined as NP-hard (nondeterministic polynomial-time hard) [5]. In example, if 15 cities are considered, there are 1.31e+12 ways of performing a Hamiltonian Circuit to visit them. In such case, finding the optimal solution (i.e., finding the Hamiltonian Circuit of minimum cost) can be a time-consuming task which becomes infeasible when larger number of cities is considered. As reported in [2] only small TSP instances (up to approximately 100 nodes) can be solved to optimality.

Due to this situation, development of metaheuristics has been performed to provide high-quality solutions in a reasonable time for different combinatorial optimization problems such as the TSP [6]. Among the most efficient metaheuristics for the TSP the following can be mentioned: Genetic Algorithms (GA), Particle Swarm Optimization (PSO), Tabu Search (TS), Simulated Annealing (SA), Ant Colony Optimization (ACO), and Artificial Neural Networks (ANNs) [3, 6].

Although GA is one of the most important metaheuristics applied for the TSP, its performance depends on its parameters settings such as initial population, selection and reproduction operators, and stop condition. As presented in [2, 7, 8] the quality of the initial population plays an important role in the solving mechanism of the GA. In the present work, a dynamic reduction-expansion operator, termed as , is presented as a strategy to improve the quality of the initial population and the convergence of a GA. When compared to other approaches on GA the operator can provide more suitable solutions for the TSP.

The advances of the present work are presented as follows: in Section 2 the technical details of the stages of the operator are presented; then, in Section 3 the results obtained on TSP instances are presented and discussed; finally in Section 4 the conclusions and future work are discussed.

2. Structure of the Reduction-Expansion Operator (RedExp)

The operator is similar to the clustering strategy presented in [2] where the k-Means Clustering algorithm was considered to generate the initial population for a GA. In [2] nodes were clustered into groups based on in order to solve a TSP with smaller number of nodes. Thus, finding a route of minimum distance was performed considering the cluster centers, and once the route of minimum distance was obtained, the clusters were “disconnected” and “rewired” to assemble a route considering the original nodes. On a selection of 14 symmetric TSP instances (with , mean = 204 nodes) and 10 trials this strategy led to a mean average error of 9.22% (mean best error of 6.97%).

As presented in [2], clustering can improve the performance of GA for the TSP. However, the distribution patterns of the nodes may affect the performance of the clustering and declustering processes by increasing variability in the initial population. This is because nodes that represent key features of the complete set of nodes can be missed by performing the clustering process, leading to their removal in the reduced (i.e., clustered) set of nodes. An example of key nodes is presented in Figure 2(a) where data from the TSP instance a280 of the TSPLIB 95 database [9] was considered. As presented, the distribution of these nodes is an important feature in its optimal solution which is presented in Figure 2(b).

By performing the clustering presented in Figure 2(c) the distribution of the key nodes of instance a280 is simplified. As a consequence, the optimal solution’s pattern of the reduced set of nodes (Figure 2(d)) is significantly different from the pattern observed for the complete set (Figure 2(b)). Note that, as presented in Figure 2(e), the pattern observed in Figure 2(d) is preserved even after declustering.

In order to address this issue, the proposed operator considers clustering of only two nodes, and only nodes which are very close to each other are candidates for clustering. This leads to a not much relaxed TSP to reduce the loss of key features. The number of pairs of nodes which are candidates for clustering is defined by a dynamic acceptance threshold metric. This process is defined as “reduction” and a route of minimum distance is estimated by a greedy heuristic. Then, “expansion” of the clustered nodes is performed to represent the route considering the original nodes. This strategy was evaluated with a selection of 41 symmetric TSP instances (with , mean = 474 nodes) and considering six scenarios where could be used alone or in conjunction with other standard processes to generate an initial population. Initial assessment of the operator was performed with a single execution (trial) of the GA for each scenario, leading to results supporting the positive effect of the operator with a combined mean best error of 4.9%. Then, an extended assessment with 10 executions or trials of the GA was performed to evaluate its statistical significance.

Based on these results, the proposed operator represents a suitable alternative to improve performance of GAs or similar metaheuristics that depend on initial solutions. Also, it can be a suitable alternative to improve performance when compared to approaches focused on modifying reproduction operators [3]. The details of the operator are described in the present section.

2.1. Clustering Stage

The first stage in the reduction-expansion process consists on determining the set of locations to be reduced. This is accomplished by the clustering process that is described in Pseudocode 1. For this process, an acceptance distance threshold is defined which is computed aswhere (a) is the minimum distance between all locations which is computed aswhere is the distance between locations and , and = 1,…, , and .(b) is the standard deviation of the distances between all locations which is computed aswhere is the variance of the distances between locations , where and = 1,…, , and .(c) is the reduction factor which is computed as

= % All nodes
= ⌀ % Set of coded nodes
= -coordinate of node i
= -coordinate of node i
= 1 % index for coded nodes
for = 1:N
for = + 1:
if
coded_nodes(r,1) = % node i is stored for clustering
coded_nodes(r,2) = % node j is stored for clustering
coded_nodes(r,3) = % x-coordinate of equivalent coded node for (i,j)
coded_nodes(r,4) = % y-coordinate of equivalent coded node for (i,j)
U = U% U is updated (i and j are removed from U)
R = % R is updated (equivalent coded node added to R)
r = r + 1
end
end
end
% Now, U contains the remaining nodes i that were not clustered or coded. These are added
to R as follows:
for= 1:
coded_nodes(r,1) = % node i is stored for assignment of new index r
coded_nodes(r,2) = - % there is no node j for non-clustered nodes
coded_nodes(r,3) = % -coodinate of node
coded_nodes(r,4) = % -coodinate of node
r = r + 1
end

It is important to mention that and exclude the distance between a location and itself (this leads to a distance equal to zero) and equation (4) is computed each time that an individual is generated (hence, a different acceptance distance threshold is computed to generate each individual in the initial population).

In this way, the acceptance threshold metric ensures that only locations or nodes that are closer than are considered as candidates for clustering. Also, in order to avoid significant variability between the original and reduced sets, a criterion of minimum distance was defined for the clustering candidates. This can be explained with the following example: consider that pairs (4,6), (6,20), and (12,6) comply with the restriction of and the distances between the nodes of each pair are 100, 150, and 120, respectively. In this case there are three clustering options for node “6”; however, the most suitable option is (4,6) because node “6” is closer to node “4” than to nodes “20” or “12”.

2.2. Coding Stage

This stage consists on coding the clustered pair of nodes (, ) as a single equivalent node with mean coordinates (, ) estimated aswhere is the index for the (new) reduced node. It is important to mention that under this process, pairs of nodes separated by distances larger than are not clustered and remain unchanged. This also happens with candidate nodes that were released from clustering due to not meeting the criterion of minimum distance. In such cases, the indexes of these nonclustered nodes are reassigned in terms of the new index . Figure 3 presents an example of the clustering and coding processes for a problem with = 7 locations.

As presented, the array contains the registry of “equivalencies” for . Thus, the coded nodes in represent the reduced nodes from (the original nodes). This registry is important for the decoding and declustering processes for . Note that this process ensures that close pairs of nodes (within a distance ) of minimum distance are kept together.

2.3. Sequencing Stage: Nearest Neighbor Strategy

Most of the routing problems consider an initial and a final node to define a particular route which consists of a sequence of nodes. This sequence can lead to a route defined as of minimum traveling cost (i.e., distance) throughout all nodes.

For the sequencing process of all nodes in it is important to identify the initial and/or final node of the route. This node depends of the routing problem itself and it is commonly identified as node 0 or node 1. Then, sequencing is performed as described in Pseudocode 2. As presented, sequencing is performed with a simple heuristic based on the nearest neighbor strategy which is expected to make the operator time-efficient and also to add random flexibility to achieve a feasible (not optimal) route of minimum cost.

= current node;
Initialization:
route_min_cost = [p] % TSP route of minimum cost starts at node 1
Sequencing:
while ≠ ⌀
closest_node =  node in R with the minimum distance to p. If more than one node comply with this
requirement, then it is randomly selected from the complying set of nodes.
;
route_min_cost = [route_min_cost closest_mode];  % closest_node is inserted at the right side of
current route_min_cost
p=closest_node; % current node is updated with the closest node
end
route_min_cost = [route_min_cost 1]; %the TSP route ends at node 1
2.4. Decoding Stage

Because the route generated by the heuristic described in Pseudocode 2 consists of elements from the reduced set of nodes in , it is required to represent this route in terms of the original set of nodes in . This expansion from to is performed by representing each unique node as the equivalent nodes (, ) from .

It is important to mention that for clustered nodes, this process implies two decoding alternatives because an equivalent node can be decoded as (, ) or (, ). Because decoding is sequentially performed left-to-right from , the decoding decision for clustered nodes is performed by computing the effect of both alternatives on the cumulative cost of the partially decoded (expanded) route.

3. Assessment

3.1. Integration with Genetic Algorithm

For assessment of the operator on the performance of the GA the following scenarios were considered for the generation of the initial population (in all cases the initial population consisted of 500 individuals):(a): all individuals are generated by random permutations ( operator) as considered by the GA presented in [7, 11].(b): all individuals are generated by a sequencing heuristic of random permutations based on the nearest neighbor strategy ( operator) as described in Section 2.3.(c): all individuals are generated by the operator.(d): 50% of all individuals are generated by the operator and the other 50% are generated with the operator.(e): 50% of all individuals are generated by the operator and the other 50% are generated with the operator.(f): 50% of all individuals are generated by the operator and the other 50% are generated with the operator.

As mentioned in Section 2.1 the acceptance threshold metric is reestimated each time that a solution is generated. Thus, due to (1), different degrees of “reduction” can be performed during the process of generating an initial population with the operator.

Then, the initial population was integrated into the standard GA which is presented in Figure 4. The selection of the crossover and mutation operators which are also presented in Figure 4 was based on the findings reported in [7, 1215].

Finally, comparison was performed with other works that have performed initial population strategies. Hence, the following works were considered for comparison purposes:(a)KMC [2]: in this work, the initial population of the GA was generated by using the k-Means Clustering (KMC) algorithm. The algorithm was tested with 14 TSP instances with nodes (mean = 204 nodes).(b)HNN [10]: in this work, the initial population of the GA was generated by a Hopfield Neural Network (HNN) and the hybrid algorithm was tested with two small TSP instances with 51 and 76 nodes.

Implementation of the GA code was performed with Octave [16] and MATLAB in a HP Z230 Workstation with Intel Zeon CPU at 3.40 GHz with 8 GB RAM. All executions of the GA started with the same random generator with its seed set at Infinite ().

3.2. Results on Main Set of 41 TSP Instances

The main test was performed with 41 TSP instances which were selected from the TSPLIB95 [9], National TSP, and VLSI TSP [17] libraries to evaluate the statistical significance of the operator on the GA’s convergence. Error from optimal solutions was computed by using the following equation [12]:

Initial assessment with these instances was performed with a single execution of the GA and a dynamic stop condition. This was performed to establish an intensive search process. The dynamic stop condition was applied on the no_best_cost variable of the main GA (see Figure 4). This variable increases, while no best solution is found within the search process, and it is set to zero when a new best solution is found. In this case, the GA iterates, while no_best_cost 1000.

Because only one execution of the GA was considered, the result in (6) is the best solution obtained with a single execution of the GA. Table 1 presents the results of the GA and the estimated error when compared with optimal results for each assessment scenario.

As presented in Table 1 the minimum mean best errors (5.5%, 6.3%, and 5.7%) were obtained with initial populations generated with , , and , respectively. Hence, as a single operator, or as a complement to and operators, has a positive effect on the final solution obtained by the GA. By selecting the minimum error achieved for each instance (throughout all scenarios) a total mean best error of 4.9% is computed.

It is important to mention that these results consider the same size of the population through all generations of the GA which was set at = 500 individuals or solutions. Hence, particularly for instances of size larger than 500 supply/demand nodes, achieving solutions with significant reductions in TSP distances with the operator supports its feasibility to improve convergence of the GA. As presented in Figure 5, if the GA is adapted to run only for 1000 generations (fixed stop condition) the faster convergences to minimum distance values are achieved if the operator is used for the initial population.

An extended assessment of the operator on the largest TSP instances (with more than 250 nodes) was performed with 10 executions or trials of the GA (as performed in [2]) and a fixed stop condition (run for 500 generations). This was performed to assess the statistical significance of the results obtained with the operator. Table 2 presents the average, best, and worst errors obtained for each of the considered instances.

The results presented in Table 2 corroborate those presented in Table 1 and Figure 5. The worst average and best error rates are observed if the initial population of the GA is generated with the operator. These are significantly improved if the initial population incorporates better solutions obtained by the or operators. When comparing the error rates between and , , and , it is observed that the minimum error rates are obtained with the operator. To quantitatively assess this difference a statistical significance test was performed on the errors reported in Table 2.

For this purpose, a paired t-test was performed with the following null hypothesis: where and are the two scenarios to be compared, and the hypothesis is focused on rejecting or validating that the mean error of is smaller than the mean error of . In contrast, the alternative hypothesis is defined as

Table 3 presents the results of the significance test with a -value of 0.10 for all scenarios. As presented, the mean errors obtained with and are statistically smaller than the mean error obtained with . In contrast, the mean errors obtained with and are statistically smaller than those obtained with , , , and . For the mean error is only statistically smaller than the mean errors of , , and . Hence, this information provides evidence that convergence of a GA can be improved if the initial population is generated with the operator alone or in conjunction with and .

3.3. Comparison with KMC

The “improved GA” developed in [2] which was used to evaluate the KMC strategy considered only three mutation operators (flip, swap, and slide) and no crossover was performed. In this case, strictly speaking, the GA presented in [2] does not include all the elements of a GA. In contrast, our GA which is presented in Figure 4 more closely resembles the “simple GA” which was reviewed in [2] as it considers crossover and mutation operators.

Other differences are the following:(a)Population size and stop conditions: in [2], according to the description of the “improved GA” and the examples that were discussed, the population size was set to 3000 and the number of iterations was set to 20000. In our GA the population is smaller (500 individuals) and the number of iterations is not fixed.(b)Construction of the initial population: in [2] once that the complete set of nodes is clustered into groups, the GA is used to obtain the local optimal path of each group and a global optimal path of groups. Then, according to the global optimal path, one edge of each local optimal path disconnects to rewire the front and back groups. This process is repeated in order to generate the initial population. In our GA, as described in Section 2, the local optimal path of clustered and nonclustered nodes is performed by the nearest neighbor heuristic described in Section 2.3. Then, declustering is performed by the decoding algorithm described in Section 2.4. Thus, our GA is only executed after the initial population is built which, after the decoding stage, considers the complete set of nodes (the GA is not executed with an initial population consisting of clustered nodes).

Due to these differences, and others associated with the hardware resources used for implementation, strict fair comparison is difficult to be performed. Nevertheless, a close comparison was performed by restricting our GA to be executed up to the average execution time of the GA presented in [2] which is very competitive. Table 4 presents the results on the TSP instances considered by [2].

As presented in Table 4 the GA with the operator, when executed during the same average time as the KMC approach, can achieve a smaller mean best error (4.6078% 6.9763%). Although for small instances the KMC achieved very small errors (i.e., for berlin52 and kroA100), for larger instances with more than 150 nodes the GA with the operator can achieve smaller errors than those obtained by the KMC approach. These results must be considered with caution due to the differences previously discussed.

3.4. Comparison with HNN

In [10] a Hopfield Neural Network (HNN) was considered for the creation of the initial population of a GA. The GA had the standard structure that was considered by our GA although with different reproduction operators as it considered heuristic crossover and mutation operators. Also, it considered a small population with 50 individuals and 100 iterations for the GA. Testing in [10] was performed with only two instances (eil51 and eil76). Table 5 presents the results reported by [10] and those obtained by the proposed GA with the operator. For consistency purposes our GA was executed during 100 iterations.

As presented in Table 5 the HNN approach achieved a smaller error than our GA with the operator. This is consistent with the significant differences observed for instances berlin52, kroA100, and pr144 in Table 4. In this case, it is important to observe that best performance of the GA with the operator is observed in large instances (i.e., more than 150 nodes) and not in small instances.

While the HNN approach presents a very small error when compared to the GA with the operator, the use of the HNN may be restricted to the size of the instance. As stated in [10] the Hopfield scheme requires neurons for a -node problem. This will be further discussed in the following section.

4. Discussion and Future Work

In this work, a reduction-expansion operator, termed as , was developed to improve the performance of Genetic Algorithms (GA) for the TSP. The application of this operator was focused on improving the initial population of the GA as performed by other works such as [2, 10].

While the operator is based on clustering as in [2], only pairs of the closest nodes were considered for clustering, and the number of clusters was dynamically defined by an acceptance threshold which considers the distance variation between all nodes in the network. Experiments performed with a set of 41 well-known symmetric TSP instances led to corroborate the suitability of the operator to improve the convergence and the quality of the final solutions obtained by a GA by obtaining a mean best error of 4.9%.

Extended assessment was performed with the 20 largest instances of this set and it was observed that, within 500 generations of the GA, the operator can improve the performance when compared to and operators.

When compared with other strategies focused on the initial population of the GA, it was observed that the proposed approach presents significant errors when tested on small instances with less than 150 nodes. However, this may be caused by the clustering process itself. As discussed in Section 2 the distribution patterns of the nodes may affect the performance of the clustering and declustering processes by increasing variability in the initial population. In this work, additional evidence about the number of nodes was also found. Particularly, for small instances, the distribution patterns are more representative of its key features. Hence, clustering can more severely affect the integrity of the key features, even if clustering is small. This can provide important insights regarding other logistic problems and solving methods based on clustering.

Another aspect that must be studied is the effect of the operator (and in general, of the clustering and nearest neighbor approaches) on the genetic diversity of the initial population. This is because, if the initial population is initialized with very good solutions (obtained by a nearest neighbor heuristic or by a deterministic method such as the Clarke and Wright (C&W) algorithm), many solutions are likely to share the same subsequences of genes. This may affect the diversification performance of the reproduction operators, leading to converge to local optima.

Thus, future work is focused on extending on the limitations of the operator. The following are considered as research topics:(a)Adaptation of the Scheme Theorem to determine the subsequences of genes which are common to all solutions within the initial solutions of the considered scenarios: in this way, the effect of these improvement strategies on the genetic diversity of the initial populations could be preliminarily assessed.(b)Developing more efficient metrics for the acceptance threshold because it has a direct effect on the clustering stage: this is focused on finding better solutions for large instances and optimal solutions for smaller instances.(c)Integrating the HNN approach within the clustering process to analyze the performance on large instances.(d)To extend on the use for other routing problems as the Capacitated Vehicle Routing Problem (CVRP).(e)To develop a metric to assess the loss of features when applying clustering.

Data Availability

The databases which were used are publicly available in the Internet. Reference URL was provided in the manuscript.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.