Abstract
In multiobjective particle swarm optimization (MOPSO), the global-best particle is randomly selected for each population particle from a nondominated solution set. However, this Roulette wheel-based global particle selection is ineffective for convergence and diversity when the problem has numerous decision variables or a large number of global-best candidates. Thus, this study proposes the cluster-based MOPSO (CMOPSO). In CMOPSO, the similarities between particles are considered when selecting the global-best particle. The cluster for each particle is determined based on the Euclidean distance in the decision or objective space. The proposed approach is demonstrated by applying an operating condition optimization problem to the hydrogen production process. The target process is a representative chemical plant with a large search space and strong nonlinearity. Furthermore, the performance of CMOPSO is assessed by comparing it with that of MOPSO. The results indicate that CMOPSO considered in the decision space exhibits superior performance in terms of convergence and diversity.
1. Introduction
In recent years, as industrial chemical plants have become more complex and sophisticated, the number of multiobjective optimization problems (MOOPs) has increased rapidly. Chemical plants have numerous unit processes, such as reactors, distillation columns, cracking furnaces, and separators, which are integrated to construct commercial plants. In addition, there are various combinations of operating conditions such as temperature, pressure, and flow rate for chemical processes, considering economic, efficient, and environmental aspects (e.g., maximizing net profit, maximizing process efficiency, and minimizing air pollutant emissions). Therefore, multiobjective optimization, instead of single-objective optimization, is crucial for handling real-world problems in chemical plants.
Evolutionary algorithms (EAs) are appropriate methods for solving MOOPs in chemical plants [1]. Although numerous solution methods have been developed to solve MOOPs, multiobjective EAs (MOEAs) are preferred for solving MOOPs because they benefit from an iterative search approach and the assets of initial solutions [2]. Therefore, numerous MOEAs have been used to find a set of optimal solutions, known as Preto optimal solutions.
The nondominated sorting genetic algorithm (NSGA) was proposed as one of the first EAs [3]. NSGA classifies nondominated fronts and sharing operations. In addition, NSGA-II was proposed to overcome the drawbacks of the NSGA method: high-computational complexity of nondominated sorting, lack of elitism, and the need for specifying the sharing parameter [4]. NSGA-II has three main advantages: fast nondominated sorting approach, fast crowded distance estimation procedure, and simple crowded comparison operator [5]. These advantages enable NSGA-II to find a much better spread of solutions and convergence than other MOEAs.
The strength Pareto evolutionary algorithm (SPEA) was developed to solve MOOPs [6]. SPEA combines several features of the established and new techniques to approximate the Pareto-optimal set. This method is characterized by sorting nondominated solutions, evaluating an individual’s fitness, preserving population diversity, and incorporating a clustering procedure. SPEA2, an improved version of SPEA, was developed by employing a refined fitness assignment and an enhanced archive truncation technique [7]. Unlike previous models, SPEA2 incorporates a fine-grained fitness assignment strategy, density estimation technique, and an improved archive truncation method [8].
A new concept of MOEA inspired by quantum computing was proposed: a multiobjective quantum-inspired evolutionary algorithm (MQEA) [9, 10]. MQEA improves the quality of the nondominated set and population diversity in MOOPs. This algorithm employs concepts of quantum computing, such as uncertainty, superposition, and interference. As the probabilistic individuals are updated by referring to nondominated solutions, the population converges to the Pareto-optimal set.
Particle swarm optimization (PSO), which was inspired by the interaction of birds and insects, was modified to solve MOOPs. Consequently, multiobjective particle swarm optimization (MOPSO) has been proposed [11–15]. MOPSO maintains a balance between exploration and exploitation in a particle swarm by guiding each particle towards the best solution. For example, the accuracy of a predictive model and the number of features included in the model could be a set of multiobjectives in MOPSO [16]. In particular, MOPSO uses the concept of Pareto dominance to determine the direction in which each particle should move and maintains the previously found nondominated solutions in an external archive. These solutions from the archive are then used by other particles to guide their next movements. Some studies focused on improving the quality of the solutions found using MOPSO, such as coevolutionary particle swarm optimization (CPSO), which is based on a bottleneck objective learning strategy that helps to maintain diversity of solutions and improve convergence of all objectives [17].
There have been several studies on solving multiobjective optimization problems (MOOPs) in commercial chemical plants, but challenges remain. One challenge is determining the priority of objectives. Most chemical plants have energy-intensive processes, so overall efficiency is often the first factor considered. However, there is also a demand for high-value chemical products, such as engineering plastics, which require a significant amount of energy to improve product quality or meet high specifications. This can lead to environmental issues as the emission of greenhouse gases (GHGs), such as CO2, increases [18–20]. Thus, various objectives, including process efficiency, product specifications, and GHG emissions, must be considered simultaneously. Determining the priority of these objectives is complicated because it depends on the process conditions and circumstances. Another challenge is the long calculation time and convergence difficulty in solving MOOPs for chemical plants. The search space for chemical plants is large due to the numerous decision variables that must be calculated, such as temperature, pressure, and flow [21]. Additionally, unit processes, such as reactors and distillation columns, can exhibit nonlinearity due to the nature of chemical processes.
The MOPSO (multiobjective particle swarm optimization) algorithm has been used to address these challenges, but it has limitations. The global-best particle selection from the Pareto-optimal set significantly impacts convergence and diversity, but it is randomly selected using the roulette-wheel selection method. This can be inefficient for high-dimensional systems with a large search space because it changes the search direction of each particle in every iteration [22], leading to slow convergence at optimal points [23]. Therefore, a new criterion is needed to select the global-best particle using the information from the searched solutions.
Alternative approaches, such as clustering, have been shown to improve the convergence and diversity of Pareto optimal solutions. For example, the self-organizedspeciation-based algorithm has been proposed to solve multimodal multiobjective problems by avoiding overlap between species to obtain more evenly distributed Pareto optimal solutions [24], while the k-means or fuzzy c-means algorithm has been used to assist the multiobjective vortex particle swarm optimization (MOVPSO) algorithm in easily identifying the individual center of the swarm [25]. However, these approaches rely on predetermined values for parameters, such as the radius or number of clusters, which can make the model inflexible to changing distributions of solutions. Additionally, these methods may not adequately consider the feasibility of solutions, which can limit their applicability in real-world problems.
To address these issues, this study proposes a cluster-based MOPSO (CMOPSO), which adopts Euclidean distance as a criterion for determining the cluster of particles in the swarm. In the proposed CMOPSO algorithm, the global-best of each particle is selected based on its cluster. This method enables particles to quickly approach the Pareto front while maintaining the flexibility and feasibility of found solutions. To demonstrate the effectiveness of CMOPSO, practical experiments were conducted by applying CMOPSO to a hydrogen production process, a representative of a high-dimensional system in a chemical plant. The main contributions to this work are as follows:(1)This study proposes a novel concept in multiobjective optimization algorithms, using cluster-basedglobal-best particle selection based on Euclidean distance.(2)The following two variations of CMOPSO are investigated: one using clustering in the decision space (CMOPSO-X) and the other using clustering in the objective space (CMOPSO-OBJ).(3)The proposed algorithms are experimentally validated in the optimization of operating conditions in a hydrogen production process.
The remainder of this paper is organized as follows: Section 2describes in detailthe proposed CMOPSO algorithm. Section 3 explains the application of multiobjective optimization to the hydrogen production process, considering the process operating conditions. Finally, in Section 4concluding remarks are presented.
2. Cluster-Based Multiobjective Particle Swarm Optimization
To obtain solutions with rich diversity, large nondominated solutions should be considered. However, Roulette wheel selection with numerous global-best candidates has slow convergence to the Pareto optimal front in MOPSO. This is because randomly selected global-best particles arbitrarily steer flight directions over a wide search space.
Figure 1 shows the flowchart of CMOPSO algorithm and pseudocode of the mechanism is presented in Algorithm 1. The key idea of CMOPSO, compared to MOPSO, is to consider the similarity between particles for selecting the global-best of each particle. Therefore, in this study, Euclidean distance was adopted as a criterion for determining the global-best cluster of each particle. In this section, CMOPSO is explained in detail in a stepwise manner.

|
2.1. Procedure of CMOPSO
The procedure for variants of MOPSO involves the five steps: initialization, evaluation, leader selection, update, and termination [13]. Based on these optimization steps, CMOPSO uses cluster technique in the leader selection step to improve the convergence and diversity of the population.
2.1.1. Step 1: Initialize Population and an External Archive
Initialize . A population represents a set of particles, and the particles have velocities to update their positions. The position and velocity of the th particle at th iteration are the -dimensional vectors, as follows:where denotes a -dimensional search space. The population is initialized within the search space, and the personal-best of the th particle is set as its own position. The velocity of the population was initialized as 0 as follows:
To determine the pareto dominance, the -objective function is defined follows:
The external archive is a set of nondominated solutions. A temporary archive is initialized by and removes dominated solutions from the archive by the dominance test [26]. Decision vector is said to dominate decision vector (also written as ) if and only if
Subsequently, the nondominated solutions are sorted by crowd-distance, and the upper solutions are stored in with a predefined maximal size .
2.1.2. Step 2: Evaluate Particles
Subsequently, the particles are evaluated using the -objective function. In a chemical plant, the objective function values are calculated using numerical or surrogate models. The surrogate model is described in Section 3.1. The personal-best of each particle is updated by comparing its new position with the current personal-best. If dominates or they are mutually nondominant, then we update . Otherwise, we do not update .
2.1.3. Step 3: Update Particles
Replace . For each particle in the population, the velocity is updated first, and then the position is updated using the new velocity. The velocity and position of the th particle at th iteration are updated as follows:where , , and are constants; and are random values of the uniform distribution in [0, 1], and different values are generated for the th particle and at every iteration; and is the global-best for the th particle, and this is selected from the external archive by following the cluster-based selection described in Section 2.2.
2.1.4. Step 4: Update External Archive
The temporary archive is updated using the merging of the previous archive and personal-best set . Using a dominance test, the dominated particles are removed from . Then, the crowd-distance for each particle in the archive is calculated, the solutions are sorted by their crowd-distance values, and the upper particles are stored in [27]. The pseudocode of updating external archive is presented in Algorithm 2.
|
2.1.5. Step 5: Determine Termination
If the termination condition is satisfied, then the iteration ends. Otherwise, we go back to Step 2 and repeat the same procedure until the end of the iteration.
2.2. Cluster-BasedGlobal-Best Selection
The number of clusters is the same as the number of nondominated solutions in the external archive ; thus, the maximal number of clusters is . This implies that each nondominated solution is the leader of a cluster. The particles select their clusters of leaders with the shortest distance to the particle compared with other leaders. If the distance is calculated in the decision space, then the index of the cluster is determined as in equation (9) and is of the type CMOPSO-X. If the distance is calculated in the objective space, then the index of the cluster is determined as in equation (10), and it is a CMOPSO-OBJ type. For the th particle in the population, the distance to every nondominated solution in is calculated on a normalized scale. The index with the shortest distance is determined as the cluster number. Subsequently, the nondominated solution with the corresponding index is selected as the global-best particle for the th particle . Particles in the same cluster were similar to their leader. They share information within the cluster when updating the particles, which enhances the search for the frontline near the leader.
In CMOPSO-X,
In CMOPSO-OBJ,
2.3. Determining the Feasibility of Solution
The feasibility of a solution should be investigated when updating the personal-best and external archives. For a solution vector at the th iteration , if it satisfies both equality and inequality constraints, the vector is defined as a feasible solution ; else it is an infeasible solution . The indicator represents the feasibility of each particle in the population, and it has a binary value as follows:where denotes the feasibility of the th particle at the th iteration and is a feasible set defined above.
2.4. Metrics of Performance
To compare the different optimization algorithms, the performance of each algorithm was calculated quantitatively [28, 29]. In the MOOP, convergence and diversity determine the quality of the searched solutions.
Convergence metrics measure the convergence of the nondominated solution set found by the optimization algorithm with respect to the ideal Pareto optimal solution set. The distance between the nondominated and optimal solutions should be minimized. The degree of convergence provides the average distance to the Pareto optimal solutions as follows [30]:where denotes the searched nondominated solution set, is the Pareto optimal solution set, is the number of objective functions, and and are the maximal and minimal values in the set of the th objective, respectively.
Diversity evaluates the distribution and spread of nondominated solutions [31]. The spacing metric measures the distribution of the solution as follows:where denotes the average of and and are the maximal and minimal values in set of the th objective, respectively. If , the solutions of set are evenly distributed in the objective space.
3. Application to Chemical Plants
3.1. Configuration for the Experiments
The main goal of applying CMOPSO is to apply it to a chemical plant; that is, the hydrogen production process. As described, it is difficult to determine the priority of the two objective functions of process efficiency and CO2 emissions in the hydrogen production process. In addition, it is necessary to find optimal solutions that are simultaneously satisfied owing to the trade-off between the two objective functions [32]. CMOPSO is a more suitable algorithm than conventional EAs because of the large number of decision variables, large search space, and strong nonlinearity of the target problem.
Hydrogen is produced from the natural gas feed by steam methane reforming (SMR) reactions (equations (14)–(16)). The process consists of the following components: (1) an SMR reactor integrated with a burner, where the main SMR reactions occur [33]; (2) hydrodesulfurization (HDS) equipment to remove sulfur components; (3) a low-temperature shift (LTS) reactor to convert CO to CO2 and H2; (4) pressure swing adsorption (PSA) equipment to separate the pure H2 > 99.999% from the syngas; and (5) heat exchangers for heat recovery [34]. The process flow diagram and the plant are shown in Figures 2 and 3, respectively.


In the hydrogen production process, there are 66 data collection points within the process (Figure 3): eight flow meters, 48 thermometers, four pressure gauges, five analyzers, and a level gauge. Among the 66 indicators, 30 were used for data generation in this study (15 input and 15 output variables) [35]. Among the analyzer variables, the syngas composition, including CO, CO2, CH4, and H2, was selected. The flowmeter data included process input variables, such as the natural gas feed, hydrogen product, natural gas fuel, and air flow rates. Moreover, monitoring and performance variables were considered, including syngas, hydrogen product, and off-gas flow rates. For the temperature data, the temperature data of the main process streams and facilities were collected: SMR reactor inlet temperature, LTS reactor inlet temperature, and steam temperature. All the variables are listed in Table 1.
A deep neural network (DNN) model with a multilayer perceptron was constructed for the data-driven model of the hydrogen production process based on the collected process data. Among the collected data, 70% were used for model training, and the remaining 30% were used to evaluate the trained model. The DNN surrogate model comprised an input layer, hidden layers, and output layer (Figure 4). The 15 input variables and 15 output variables were determined by the domain knowledge for the hydrogen production process, and hyperparameters for the DNN surrogate model were determined (Table 2).

To solve the MOOP, the proposed CMOPSO was used, and the following two objective functions were considered: maximizing process efficiency and minimizing CO2 emissions. Objective function 1 is defined as the ratio of product energy to energy input to the process. Assuming a constant feed input, the fuel flow rate should be reduced or hydrogen production should be increased to increase the process efficiency. Furthermore, objective function 2 was defined as the total CO2 emissions from the flue gas during the hydrogen production process.where denotes the efficiency of the process; , , and are the flow rates of the hydrogen product, natural gas feed, and natural gas fuel, respectively; and are the lower heating values of the hydrogen product and natural gas, respectively; is the flow rate of flue gas from the burner; and is the molar fraction of CO2.
To solve the MOOP for the hydrogen production process, four constraints are considered that represent the process feasibility (equations (18)–(21)). Constraint 1 represents the maximum conversion in the SMR reactor. Constraint 2 is an important key performance indicator (KPI), that is, the steam-to-carbon ratio (SCR), which indicates the ratio of the steam flow rate to the natural gas feed flow rate. In general, SCR is maintained between 3 and 4. Constraint 3 shows another KPI for the process operation: the air-to-fuel ratio (AFR). AFR is the ratio of the input air-flow rate to the fuel flow rate. AFR is typically between 1.3 and 1.7 for stable burner operation.
Constraint 1. Maximum conversion:
Constraint 2. Steam-to-carbon ratio (SCR):
Constraint 3. Air-to-fuel ratio (AFR):
Constraint 4. Boundary conditions:The parameter settings for the models are as follows. The swarm size is set to a range of (1,000, 10,000, and 50,000)particles. It is usually recommended to use a swarm size of around 100200 particles for most problems but a larger number might be required for high-dimensional systems [36]. The inertia weight (), the cognitive acceleration coefficient (c1), and the social acceleration coefficient (c2) are set to the general values of and , as suggested in previous studies [37, 38]. The maximum archive size is varied from 10 to 100. The optimal value for the archive size depends on the number of nondominated solutions that need to be stored and the available memory.
3.2. Experimental Results
To evaluate the convergence of obtained nondominated solutions, Pareto-optimal solution set is required as a criterion. However, optimal solutions for chemical plants are unknown because of their complexity. Therefore, in this study, solutions were obtained from experiments combining the population size (1,000, 10,000, and 50,000) and the maximal size of the external archive (10, 50, and 100). Thereafter, the nondominated solutions of each case were collected, and the final solutions were determined by the dominance test. These final solutions were assumed to be the Pareto-optimal solution set for the SMR process. Figure 5 shows the Pareto front obtained. The trade-off relationship between thermal efficiency and CO2 emissions is shown in the Pareto-front solutions. For example, when process thermal efficiency increases (positive effect), CO2 emissions also increase (negative effect). The reason is that the temperature of the SMR reactor was increased by increasing the input flow rate of NG fuel. As a result, the conversion of SMR reactions (equations (14)–(16)) also increased, leading to higher hydrogen production and thermal efficiency. However, many combustion reactions occur due to the increase in NG fuel flow rate. So, the amount of CO2 in the combustion gas is increased, which has a negative impact on the environment. Conversely, if the SMR reactor’s temperature decreases and the SMR reaction’s conversion also decreases by reducing the input amount of NG fuel, the amount of hydrogen produced decreases, and the thermal efficiency of the process decreases. However, CO2 emissions are reduced, which is an advantage from an environmental point of view.

To compare the performance of different algorithms, the convergence of the searched nondominated solutions was tested according to iteration. In this experiment, the population size and maximum number of external archives were set to 50,000 and 100, respectively. Figure 6 shows the particles in the objective space that satisfied the constraint conditions. In the first iteration (), most particles do not satisfy the constraints; thus, only a few points are plotted in the objective space. As the iteration proceeds, the particles pursue their global-best and search within constrained conditions. In MOPSO, the swarm of particles is widely distributed because their global-best is randomly selected, and this randomness reduces the search efficiency. In contrast, in CMOPSO, the global-best for each particle is selected based on its cluster, such as a subswarm in which the particles have similar characteristics. This enables an efficient and intensive investigation of the front line, which exhibits a narrow swarm distribution.

Figure 7 and Table 3 shows the average convergence values (), and each point was repeated 10 runs. It was confirmed that the particles of CMOPSO approached the Pareto front faster than those of MOPSO. Particularly, CMOPSO-X exhibited the fastest convergence rate. Furthermore, the convergence of CMOPSO-X at the 10th iteration ( = 0.09651) was better than that of MOPSO at the 20th iteration ( = 0.10140). In the CMOPSO algorithm, the particle selects the leader cluster with the shortest Euclidean distance to the particle compared to other leaders. Then, when updating particles, we share information within the cluster to improve frontline search near the leader. This solved the problem of randomly selecting the global-best particles. This result indicates that the CMOPSO algorithm can find a better solution set with fewer iterations than MOPSO.

The performance of CMOPSO-X based on “decision space” and CMOPSO-OBJ based on “objective space” is evaluated using convergence and spacing metrics. Two performance metrics were calculated according to equations (12) and (13). And the average of the convergence () and diversity (Δ) values over 10 runs in each setting are listed.
As summarized in Table 4, CMOPSO-X exhibits the dominant performance according to the convergence metric for all results. Particularly, CMOPSO-X exhibits the lowest convergence value (the best performance) when applying a population of 50000 and gbest of 50 ( = 0.05064). Although the gbest setting was different, MOPSO and CMOPSO-OBJ also exhibited the best performance at a population of 50,000, similar to CMOPSO-X. In the case of a lower population (1,000), the convergence value of CMOPSO-OBJ is lower than that of MOPSO, indicating better performance. In contrast, when the population was relatively large (50,000), it was demonstrated that the nondominated solutions converged more to the Pareto optimal set (lower value) using MOPSO instead of CMOPSO-OBJ.
Similar to the convergence performance, the diversity value of CMOPSO-X is superior to that of the other two algorithms. The best average diversity value of CMOPSO-X was 0.00023, which was the best for a population of 50000 and gbest of 100. MOPSO also exhibited excellent diversity with the same population and gbest settings. In contrast, the CMOPSO-OBJ algorithm exhibited the best diversity performance of 0.00173 when a population of 50,000 and gbest of 50 were set. In all three algorithms, it is observed that as gbest increases from 10 to 50 and 100, the diversity value drastically decreases, and the diversity performance improves. For instance, in the case of CMOPSO-X using a 50,000 population, the diversity value decreased from 0.00563 to 0.00067 and 0.00023, respectively. This explains that as gbest increases, there are more leaders from which particles can select, which can increase the diversity of optimal solutions.
Despite the superior performance of CMOPSO-X in terms of convergence and diversity, there are several limitations to the algorithm that should be taken into account. One limitation is that the results of the algorithm can be sensitive to the initial clustering of particles, which may not accurately represent the true objectives of the optimization problem. As a result, CMOPSO may not be able to find good solutions for all problems. Another limitation is that the algorithm can be computationally expensive due to the need to maintain multiple clusters and calculate multiple objective functions. Despite these limitations, CMOPSO has been shown to be a valuable technique for solving multiobjective optimization problems and should be carefully considered for use in specific applications.
4. Conclusions
In this study, cluster-based multiobjective particle swarm optimization (CMOPSO) was proposed. The main idea of CMOPSO, compared to MOPSO, was to consider the similarity between particles for selecting the global-best. Thus, Euclidean distance was used as a criterion for determining the global-best cluster of each particle. Two types of CMOPSO were proposed based on the Euclidean distance in the decision space (CMOPSO-X) or objective space (CMOPSO-OBJ). These methods resolved the leader selection problem of MOPSO and enabled the particles to approach the Pareto optimal front quickly. The effectiveness of the proposed algorithms was experimentally verified by applying an operating condition optimization problem to the hydrogen production process. CMOPSO-X exhibited superior performance with respect to convergence and diversity. Based on these results, CMOPSO-X can be applied to various MOOPs with numerous decision variables and a large number of global-best candidates, such as chemical plants.
Data Availability
The datasets used to support the findings of this study were supplied by Jaewon Lee under license and so cannot be made freely available. Requests for access to these data should be made to the Jaewon Lee, j.lee@kitech.re.kr.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Authors’ Contributions
Seokyoung Hong and Jaewon Lee contributed equally to this work.
Acknowledgments
This work was supported by the Korean Institute of Industrial Technology within the framework of the following projects: “Development of AI Platform for Continuous Manufacturing of Chemical Process (Grant no. JH-23-0002).”