Abstract
Evolutionary algorithm is an effective way to solve process discovery problem which aims to mine process models from event logs which are consistent with the real business processes. However, current evolutionary algorithms, such as GeneticMiner, ETM, and ProDiGen, converge slowly and in difficultly because all of them employ genetic crossover and mutation which have strong randomness. This paper proposes a hybrid evolutionary algorithm for automated process discovery, which consists of a set-based differential evolution algorithm and guided local exploration. There are three major innovations in this work. First of all, a hybrid evolutionary strategy is proposed, in which a differential evolution algorithm is employed to search the solution space and rapidly approximate the optimal solution firstly, and then a specific local exploration method joins to help the algorithm skip out the local optimum. Secondly, two novel set-based differential evolution operators are proposed, which can efficiently perform differential mutation and crossover on the causal matrix. Thirdly, a fine-grained evaluation technique is designed to assign score to each node in a process model, which is employed to guide the local exploration and improve the efficiency of the algorithm. Experiments were performed on 68 different event logs, including 22 artificial event logs, 44 noisy event logs, and two real event logs. Moreover, the proposed algorithm was compared with three popular algorithms of process discovery. Experimental results show that the proposed algorithm can achieve good performance and its converge speed is fast.
1. Introduction
Process-based information system (PIS), including workflow management system (WfMS), customer relationship management (CRM), enterprise resource planning (ERP), has become the fundamental infrastructure of modern enterprises. PIS can greatly improve the operational efficiency of enterprises. Besides that, it will record the information of business processes, such as name of activities, time of activities happening, life cycle of activities, to form event logs. The XES Standard, published by IEEE in 2016, provides a unified and extensible language to standardize the content and format of event logs. Process mining technique can be used to discover a process model from a XES event log. It is hoped that the mined process model is as consistent as possible with the real business process. The obtained process model can be used to improve business processes of enterprises, increase production efficiency, and optimize products. For example, ASML employs process discovery technique to optimize the wafer scanner during the production of lithography machines. The ERP system of SAP uses the process discovery technique to assist users design business process, analyze the bottleneck of business, and plan resource. Philips collects the event logs from their medical devices around the world to analyze customers’ habits. By this way, they can optimize their medical products and shorten the time of product development.
Generally, there are three major tasks in process mining, including process discovery, conformance checking, and process enhancement [1]. Process discovery aims to obtain a process model, which is as consistent as possible with the real business process. Most studies are focused on mining binary relations of any two activities in event logs from the perspective of control flow. Beyond that, it can also discover the knowledge contained in the event log from other perspectives, such as organization, time, resource, and so on. Conformance checking is used to measure the deviation of a mined process from a real business process by replaying the event log on the mined process model. This technique can be used for the diagnosis of the process model as well as the analysis of business bottleneck. Process enhancement focuses on changing or extending a prior process model. For example, by using time-stamp in an event log, the model can be enhanced to analyze bottleneck, estimate remaining time, and discover hierarchical process model. In this paper, I only focus on the process discovery technique from the perspective of control flow.
The ɑ-algorithm, proposed by van der Aalst et al., is usually regarded as a milestone in the field of process mining [2]. It models the workflow by Petri net and can effectively find the causal relation, parallel relation, and choice relation between any two activities from the event log. After that, some variants of the ɑ-algorithm were proposed, just like ɑ+ algorithm [3] and ɑ++ algorithm [4]. However, there are many shortcomings in the ɑ-series algorithms, such as ability to resist noise, alignment-based fitness, and precision. To address these problems, some more efficient algorithms were proposed, such as ILP Miner [5] and inductive miner [6, 7]. The former was proposed recently by van Zelst et al., which is based on integer linear programming. The latter was proposed by Leemans et al. Both of them show good performance when dealt with small event logs.
Evolutionary algorithm is an effective way to solve the problem of process mining. de Medeiros et al. [8] firstly employed the genetic algorithm (GA) in process mining, named as GeneticMiner. By defining good fitness function as well as genetic operators (i.e., crossover and mutation), the GeneticMiner can find a process model, which is consistent with the real process. Cheng et al. [9, 10] indicated that the GeneticMiner cannot effectively discover parallel structures from event logs; therefore, they proposed a hybrid technique, which is based on integration of GeneticMiner, particle swarm optimization, and differential evolution to improve the result of process mining. Vázquez-Barreiros et al. [11] proposed another algorithm, named ProDiGen, which improves the GeneticMiner by introducing hierarchical fitness function to find complete, precise, and minimal structure process models. Buijs et al. [12–14] employed evolutionary tree to present process models and proposed an alignment-based technique to guide the mutation in GA. However, it is unbearable to embed alignment-based local search in mutation operation because it is too time-consuming. In general, the advantage of the genetic algorithm includes good antinoise, and it can deal with a major part of key problems in process mining by a unified framework, such as invisible tasks, non-free-choice structure, and tasks with duplicated name. However, the convergence speed of current GA-based algorithms is too slow because all of them adopt random search.
In this paper, a hybrid evolutionary algorithm for process mining is proposed, named DEMiner. The innovation of our work includes three parts:(1)A hybrid evolutionary strategy is proposed in this work. In our method, DEMiner firstly approximates the optimal process model by a set-based DE algorithm; when the prematurity is detected, a guided local exploration method will join in the evolution process to help the algorithm skip out the local optimum.(2)Two set-based DE operators, i.e., a set-based mutation operator and a set-based crossover operator, are designed for differential evolution of causal matrix.(3)A fine-grained evaluation method is proposed to guide the local exploration by assigning scores to all the nodes in the candidate process models. This method can not only help the population avoid prematurity but also improve the efficiency of the DEMiner.
The rest of this paper is organized as follows. Section 2 introduces some basic knowledge of process mining, such as Petri net and causal matrix, as well as the DE algorithm. Section 3 explains the proposed algorithm in detail. The experiments as well as the analysis of experimental results are given in Section 4. Finally, Section 5 gives conclusions.
2. Preliminaries
2.1. Process Mining
In the problem of process mining, a process model is generally modeled as a place/transition net (abbreviated as P/T Net), which is a variant of classic Petri net. The definition of P/T is given below.
Definition 1. (P/T Net) [2]. A P/T Net is a tuple , where is a finite set of places, is a finite set of transitions, , and is a finite set of directed arcs.
Let be a P/T Net. Elements of are called nodes. A node is an input node of another node if . Similarly, a node is an output node of another node if . Furthermore, a symbol denotes all the input nodes of node ; that is, . Similarly, denotes all the output nodes of . Based on the P/T Net, a formal definition of workflow net (abbreviated as Wf-Net) is given.
Definition 2. (Wf-Net) [2]. Let be a P/T Net, and is a fresh identifier not in . is a Wf-Net if(1) contains an input place such that (2) contains an output place such that (3) is strongly connectedFigure 1 shows a process model, which is represented by a Wf-Net. The circles denote places, and the squares denote transitions. The transitions in Wf-Net represent the activities (also called tasks) in the real business process. The black dot in the initial place denotes a token. A transition is enabled to be fired if all of the input places contain tokens. The word “fire” means that an activity is ready to be executed. If a transition fires, tokens in its input places are removed; meanwhile, tokens are put in its output places. For example, if the transition “A” is fired, the token in place “start” would be removed and the place “P1” and “P2” will get a token, respectively. After that, three transitions “B,” “C,” and “D” would be enabled. Note that “P1” just has one token; in other words, although two transitions (a.k.a. “B” and “C”) are enabled, only one can be fired. Thus, the possible sequences between “A” and “E” include {B, D}, {C, D}, {D, B}, and {D, C}.

Given a sound Wf-Net , we say is an event trace and is an event log, which consists of traces. Take the above process model as an example, and it can induce lots of event traces, just like ABDEG, ACDEH, ADBEFCDG, and so on.
The first problem required to be solved in evolution-based process mining is coding of chromosome. Unfortunately, it is hard to directly employ Petri net for evolution. de Medeiros proposed the causal matrix to represent the process model, which has been applied to many evolutionary algorithms of process mining, such as ProDiGen and GeneticMiner. The definition of the causal matrix is given below.
Definition 3. (Causal Matrix) [8]. A causal matrix is a tuple , where(1) is a finite set of activities(2) is the causality relation(3) is the input condition function(4) is the output condition functionSince we usually need to compare the process model presented by the causal matrix with other models presented by Petri net, a method of mapping the P/T Net to the causal matrix is required. Definition 4 shows a method for converting a P/T Net to a causal matrix.
Definition 4. (Mapping of a P/T Net to a Causal Matrix) [8]. Let be a P/T Net. The mapping of is a tuple , where(1)(2)(3) such that (4) such that To explain the mapping process, we convert the process model in Figure 1 to a causal matrix, which is shown in Table 1. Take activity “E” as an example, and it has two input places “P3” and “P4.” Furthermore, the input transitions of “P3” are transitions “B” and “C,” and the input transition of “P4” is transition “D,” and thus, . It should be noted that those activities in the same subset of have an OR-join relation, and those different subsets in have an AND-join relation. On the contrary, activities in the same subset of have an OR-split relation and those different subsets have an AND-split relation. Besides, demonstrates that the input of the activity is empty and demonstrates the output of the activity is empty.
2.2. Differential Evolution Algorithm
Differential evolution algorithm (DE), firstly proposed by Das et al. in 1995, is a stochastic method simulating biological evolution, in which the individuals adapted to the environment are preserved through repeated iterations [15]. Compared to other evolutionary algorithms, the DE algorithm has some advantages, such as better global searching ability, fast convergence speed, and strong robustness.
The major steps of DE include mutation, crossover, evaluation, and selection, which is similar with GA. DE starts with a population, which contains N randomly generated individuals (also known as chromosomes). The individual is represented by a vector , where i denotes the i-th individual, G denotes current generation, and D denotes the dimension of the vector. In the step of mutation, DE generates a donor vector for the i-th individual (called target vector). It firstly chooses three distinct vectors , , and from the population. The indices are mutually exclusive integers randomly chosen from the range [1, N]. Then, the difference of any two of these three vectors (i.e., and ) is scaled by a scalar factor , and then the scaled difference is added to the last vector ; it finally obtains the donor vector. Such process can be expressed by the following formula:
To enhance the potential diversity of the population, a crossover operation comes into play after generating the donor vector. The donor vector exchanges its components with the target vector under this operation to form the trial vector . There are two popular ways for crossover in DE, which are exponential crossover (or two-point modulo) and binomial crossover (or uniform). This paper just introduces the latter, which is given in formula (2), where Cr is the crossover rate and is a random number. The condition guarantees that at least one element of the donor vector will be selected. The obtained trial vector will be evaluated by a predefined fitness function. If the fitness of the trial vector is higher than the fitness of the target vector, the DE algorithm would replace the target vector by the trial vector; otherwise, it keeps the target vector:
3. DEMiner: A Hybrid Evolutionary Algorithm for Process Discovery
3.1. Framework of DEMiner
The GA-based process mining algorithms, including GeneticMiner [8], ProDiGen [11], and ETM [12], suffer from the problem that all of them need hundreds or even thousands of generations to converge to a solution. The reason behind this problem is that the genetic operators follow a completely random way, without taking advantage of the information of the log and the errors of the mined model during the parsing of the traces. The ProDiGen solves this problem in a simple way that it selects an incorrectly parsed activities as the crossing point in the step of crossover. However, it cannot significantly improve the speed of convergence because the process of crossover is also random. The ETM employed the alignment-based technique for local exploration to accelerate the convergence. However, the total running time is also unacceptable because the alignment algorithm is too time-consuming.
In this section, I will introduce a hybrid evolutionary algorithm for automated process discovery, called DEMiner. The main steps of DEMiner are shown in Figure 2. It can be seen from the figure that the major difference between DEMiner and the traditional evolutionary algorithm is that it needs to select a specific evolutionary strategy in the loop. Step V is a set-based DE algorithm (abbreviated as DE or DE algorithm), which is in charge of fast approximation of the optimal solution. However, the DE algorithm usually falls into local optimum. To overcome the premature convergence, I employ Step VI, which is a guided local exploration algorithm. The local exploration algorithm can take advantage of the error information during the parsing of the log and help DEMiner quickly skip out the local optimum.

The pseudocode of DEMiner is given in Algorithm 1. The algorithm skips out the loop when the number of generation is higher than a predefined threshold maxGenerations or the timesNotChange is higher than maxNotChange. The variable timesNotChange records how long the population has not been replaced. In the loop, two statistics, i.e., meanFitness and devFitness, are used to detect whether the algorithm appears premature convergence. The former is the mean fitness of the population, and the latter is the deviation of the fitness value. If meanFitness and devFitness are lower than predefined thresholds MF and DF, at the same time, I think the algorithm is premature. Besides two statistics, a random number rand is used in the condition. The reason behind this consideration is that the global searching ability of the local exploration algorithm is lower than the DE algorithm. Sometimes, the local exploration may make an individual move forward, but it may not cause significant changes in two statistics. Therefore, the proposed algorithm will randomly choose a strategy if it falls into local optimum. Next, I will introduce these steps in detail.
|
3.2. Population Initialization
The population initialization method used in DEMiner follows the heuristic method proposed in [8], which is based on the causal relation between activities. Except that, there are two changes in our method, called gene bank and taboo list, which can improve the performance of DEMiner.
Gene bank is a set of chromosomes (i.e., individuals), including the individuals that are in current population and the individuals that have been eliminated during the evolution. To reduce the cost of memory space, the individuals in gene bank will be serialized; in other words, they will be converted to a simple format. For example, will be converted to a string “I(E) = [[B, C],[D]].” If the algorithm generates an individual which has been in gene bank, the individual would be discarded without calculation of its fitness value.
Taboo list keeps a set of historical operations of local exploration. In DEMiner, an important step, called guided local exploration, is employed to search around a specified node. The local exploration requires to randomly select one of the three operations, which are adding an arc, deleting an arc, or redistributing a node. These operations that have been performed, no matter they are useful or useless, are forbidden to be selected again. Every node has a taboo list, and these lists will be initialized to empty at the beginning.
3.3. Fitness Function
Generally, two metrics should be considered when evaluating a process model, which are completeness and precision [16]. The completeness quantifies the ability of a discovered process model that it can accurately parse the traces recorded in the event log. A natural way to define a completeness metric is the number of correctly parsed traces divided by the total number of traces. However, such definition is too coarse because it cannot indicate how many parts of an individual are correct when the individual does not properly parse a trace. Consider two process models: one is a totally incorrect process model, and the other just misses an arc; the above method cannot distinguish the two individuals because both of them cannot correctly parse the log. Due to this, I employ the partial completeness given in [8], which takes into account the correctly parsed activities as well as the number of tokens, which are missing or not consumed during the parse. I use the symbol “Cf” to denote the completeness metric.
A discovered process model may not be appropriate even if it gets completeness. For example, a flower model can parse arbitrary event logs, but it is useless. Precision is used to quantify the fraction of the behavior allowed by the model, which is not seen in the event log. However, it is hard to give a proper definition of precision because it has to detect all the extra behavior, i.e., possible path in the model but not in the log. In [4], the definition of the precision is divided by , where is the number of enabled activities when a log is parsed by an model . The denominator is a function which returns the max number of enabled activities in the population. It is easy to find that the precision of each individual depends on the rest of the population. In this work, I consider another definition of precision proposed in [11] (see formula (3)). It is easy to find that the more the activities enabled in a process model is, the lower the precision of the process model is:
Generally, I need to assign weighted coefficients to combine the two metrics in a weighted sum [17]. However, it is difficult to combine the two metrics in an appropriate way because the used precision is not normalized. Therefore, a hierarchical method is employed to define the fitness function in this work. Because the completeness is more important than the precision when evaluating a discovered process model, I firstly compare the completeness of two process models; if their completeness is equal, then I compare their precision. By this way, when the completeness of all individuals are equal to 1, the individual that has better precision would win. It is easy to notice that the hierarchical fitness function can be easily extended by other metrics, such as structure complexity and generalization.
3.4. Differential Evolution Algorithm
The DE algorithm contains a loop which goes through all individuals in the population. For each individual (called the target individual), it firstly generates a donor individual based on three randomly selected individuals (mutation operation), and then it combines the target individual and the donor individual to get a trial individual (crossover operation). It must be emphasized that, due to the obtained donor/trial individual may be inconsistent, both of them should be repaired before going to next step. Then, the trial individual will be evaluated if it is not in the gene bank. If the fitness value of the trial individual is higher than the fitness of the target individual, the target individual would be replaced by the trial individual; meanwhile, the trial individual would be added into the gene bank. It can be seen that there are three key steps in the DE algorithm, which are mutation, crossover, and repair. Next, the details of the three steps will be explained.
3.4.1. Mutation
Current set-based evolutionary algorithms usually employ crisp sets to represent the candidate solutions (which are called individuals or chromosome in GA). For example, Chen et al. [18] proposed a set-based particle swarm optimization algorithm in which the candidate solutions are represented by a set of ordinal pairs. However, the causal matrix is a type of much more complex set. The elements in I(Activity) and O(Activity) are crisp sets, such as . Therefore, the traditional set-based mutation operators cannot be directly used in this work. In Ou-Yang’s method [9, 10], the mutation operator randomly selects ingredients from three individuals, and then use them to update the target individual to obtain a donor individual. The advantage of the method is that some good ingredients (e.g., parallel structure) can be directly transplanted to the target individual, which can improve the searching ability of the GeneticMiner. However, Ou-Yang’s method cannot be directly employed in this work because the proposed algorithm is entirely based on the DE algorithm. In other words, it needs more flexible mutation operators.
This section introduces two novel operators, which allow the proposed algorithm to perform differential mutation on the causal matrix. The definitions of the two operators are given below.
Definition 5. (Minus Operator between Two Sets). Given a causal matrix and two sets . The relative complement of in is defined as
Definition 6. (Plus Operator between Two Sets). Given a causal matrix and two sets . Then, is defined aswhere denotes a generalized union operation and means that it removes the elements in from . It is easy to find that the plus operator will keep the elements in . The reason behind this consideration is that in formula (4) is, in fact, the result from formula (3) (a.k.a. the difference of two sets). By this way, it can greatly change the structure of and enhance the potential diversity of a trial individual.
Figure 3 gives an illustrating example. Given three sets which represent three distinct input sets of activity D, , , and . In terms of the definition of mutation in DE (see formula (1)), it requires to firstly calculate the difference of and and then add the difference to . Based on Definitions 5 and 6, we can get and then . Note that the scale factor is not used in the mutation.

(a)

(b)

(c)

(d)

(e)
3.4.2. Crossover
The aim of crossover is combining a donor individual and a target individual to generate a trial individual. The trial individual would take the place of the target individual if its fitness is higher than the target individual. There are two kinds of popular crossover methods, which are the exponential and the binomial. I employ the latter in this work. The pseudocode of the binomial operator is shown below. “Cr” is called the crossover rate. The binomial crossover is performed on each of the activity node whenever a randomly generated number “rand” between 0 and 1 is less than or equal to the “Cr” (Algorithm 2). The “r” is a randomly chosen index, which ensures that the trial individual can get at least one component from the donor individual.
|
3.4.3. Repair
As is known to all, individuals obtained in the iterative process of evolutionary algorithms are always inconsistent. For example, it is possible to get a trial individual in which , but does not contain activity “E.” Besides that, the input of the “start” activity as well as the output of the “end” activity may be not empty. Therefore, a repair operation is required to be performed on the donor individual as well as the trial individual. In the GA-based process mining algorithms, such as GeneticMiner and ProDiGen, the repair operation is simple because the crossover as well as the mutation are performed on a designated point. The repair operation in this work is much more complex because the mutation as well as the crossover operation are performed on all nodes in a causal net. In other words, it is required to repair all nodes in a causal matrix. Before introducing the algorithm of repair, I firstly give the definition of consistence for a causal matrix.
Definition 7. (Consistence of a Causal Matrix). Let be a causal matrix, and we say it is consistent if(1), where “start” is the beginning activity(2), where “end” is the ending activity(3)For , , where (4)For , , where The pseudocode of repair is given in Algorithm 3. Steps 1–6 are in charge of repairing the “start” node as well as the “end” node. It firstly lets the and be empty. Then, it would generate a new input set (output set) for () if they are empty. From Steps 7 to 19, the algorithm goes through all nodes in the causal matrix and repair and , respectively. There are two choices in the repair operation. Take the repair of the input of for an example; if the output of does not contain , it may randomly add to or remove from . By this way, we will finally obtain a consistent individual.
|
3.5. Guided Local Exploration
Although evolutionary algorithms, including the GA algorithm and the DE algorithm, have strong ability of global search, all of them suffer from the problem of premature convergence. In [14], van Eck et al. proposed a local exploration method based on alignment. In the method, a algorithm is employed to find the optimal alignment between a process model and an event trace.
By this way, it can find out the abnormal areas in the process model. However, the alignment-based local exploration has two drawbacks. Firstly, it can only locate the abnormal area but not a node, which is too coarse to guide the exploration. Secondly, although the technique can accelerate the convergence of the GA algorithm, the execution time of the algorithm is so long, which makes the total execution time unbearable.
This paper proposes an efficient and simple method to guide local exploration, which can help DEMiner skip out the local optimum and move forward to the global optimum. The method is based on token-based log replay, which is also employed in evaluation of the process model (a.k.a. causal matrix) in this work. The original algorithm for parsing a log on a causal matrix only records three types of information, which are “allParsedActivities,” “allMissingTokens,” and “allExtraTokensLeftBehind.” The allParsedActivities denotes the total number of activities which are correctly parsed, the allMissingTokens denotes the number of missing tokens in all event traces, and the allExtraTokensLeftBehind denotes the number of tokens that are not consumed after the parsing. In our method, besides those, it requires to record the nodes that where the parsing errors happen, including miss tokens during the parsing or left behind tokens after the parsing. By this way, it can achieve fine-grained evaluation of all nodes.
It is easy to find that the proposed method has several advantages. Firstly, it can accurately locate the abnormal nodes and improve the efficiency of the local exploration. Secondly, the time complexity of the proposed method is much lower than the alignment-based method because it does not need extracomputation. The evaluation of nodes can be finished along with the evaluation of the individual. The formulas of fine-grained evaluation are given below, in which represents the score of and represents the score of :
The pseudocode of the guided local exploration is shown in algorithm 4. Step 2 employs a roulette wheel strategy to randomly select a node for local exploration. The node with a lower score has great probability to be selected. Step 3 randomly selects a direction for exploration, i.e., “input” or “output.” Steps 4–28 randomly choose a mutation operation, including randomly add an arc to the node, randomly delete an arc from the node, and randomly redistribute the structure of the node. An example for illustrating the redistribution operation is that given , it may get or after the redistribution. In the algorithm, a taboo list is used to record the history of local exploration. Some operations which are useless (a.k.a. cannot make the individual move forward) are recorded in the taboo list. By this way, it can improve the efficiency of the local exploration.
|
4. Experiments
In this section, I give the experiments as well as the analysis. The experiments are focused on two aspects. The first is to evaluate that whether the DE algorithm and the guided local exploration are efficient to accelerate the convergence speed of the proposed algorithm. The second is to evaluate the performance of the proposed algorithm (a.k.a. the DEMiner). Next, the event logs used in the experiments will be introduced.
4.1. Event Logs
In the experiments, 66 event logs were used for evaluation of the proposed algorithm. The event logs can be classified into three groups. The first group contains 22 artificial event logs which are from [8, 19] and can be downloaded from https://svn.win.tue.nl/repos/prom/DataSets/GeneticMinerLogs/. The description of the event logs is shown in Table 2. The process models that generate these logs include different structures, such as sequence, choice, parallelism, loops, and invisible tasks. These process models, represented as Petri nets and heuristic nets, can be found in [19]. In the event logs, the traces with same event sequence are grouped together.
The second group, which is used to evaluate the antinoise ability of the DEMiner, contains 44 event logs. These event logs were generated based on the first group of event logs, which contain 5% and 10% noise, respectively. Three different types of operations for noise generation were used, including randomly add an event into a trace, randomly delete an event from a trace, and randomly swap two adjacent events in a trace. To incorporate noise, the traces of the original noise-free logs were randomly selected and then one of the three noise types was applied and each one with an equal probability of 1/3.
The third group includes two real event logs, both of which are downloaded from https://data.4tu.nl/repository/collection:event_logs. The first event log which was named “BPI2013cp” records the process information from the Volvo IT problem management system. It includes 1487 traces as well as 6660 events. The second event log which was named “Sepsis” records the events of sepsis cases from a hospital ERP system. It includes 1050 traces as well as 15214 events.
4.2. Convergence Speed and Running Time
This section introduces the experiments for evaluation of the efficiency of the DE algorithm as well as the guided local exploration method.
This section evaluates whether the DE algorithm and the guided local exploration are efficient to accelerate the convergence speed of the proposed algorithm. Four different strategies are considered in the experiment, which are the DE algorithm without local exploration (denoted as DE), the DE algorithm with random local exploration (denoted as DE + Random Search), the DE algorithm with guided local exploration (denoted as DE + Guided Search), and the GA. It should be explained that (1) the random search used in the second strategy is the genetic mutation in the GeneticMiner, and (2) the third strategy is the DEMiner proposed in this work, and (3) the GA algorithm follows the framework in this work but uses the genetic operators of the GeneticMiner. Three metrics are employed in the experiments, which are completeness, precision, and generation (i.e., number of iterations). To avoid the inaccuracy of the experimental results caused by randomness, each algorithm was run 10 times and the average value of each metric as well as its standard deviation is calculated.
The first group of event logs was used for the evaluation. The computer for experiments is equipped with a 2.5 GHz CPU and 8 GB memory. The parameters setting is shown in Table 3. It should be explained that the population size is the number of activities multiplied by 1∼2. The parameter “MF” is set to 0.7 because the local exploration is not hoped to be involved in the search too early. The parameter “DF” is set to 0.2 which is used for detection of premature convergence. In fact, a slight change of these parameters, e.g., the MF is set to 0.6∼0.8 and the DF is set to 0.1∼0.2, would not affect the performance of the algorithm, including the quality of mining results as well as the convergence speed.
The experimental results are shown in Table 4. For completeness, it can be seen that the four algorithms (from left to right in Table 3) always achieve completeness 1.00 on 1 event logs, 10 event logs, 20 event logs, and 12 event logs, respectively. The result demonstrates that the “pure” DE algorithm cannot achieve the best model; a.k.a., it always falls into local optimum. Then, for precision, the “DE” algorithm achieves the best precision on just one event log. Except that, the “DE + Random Search” and the “GA” achieve the best precision on 2 event logs, and the “DE + Guided Search” achieves the best precision on 14 event logs stably. From the two metrics, it can be seen that the “DE + Guided Search” plays much better than other three strategies. For generation, it requires to exclude the “DE” algorithm because it always suffers from premature convergence. Among the remaining algorithms, it is obvious that the “DE + Guided Search” has the fastest convergence speed and the “GA” has the slowest convergence.
To illustrate the time complexity of the DEMiner, the running time of “DE + Guided Search” is also recorded, which is shown in Figure 4. From the figure, it can be seen that the minimum running time is about 3 seconds (“a6nfc”) and the maximum running time is about 80 seconds (“bn3”). This proves the time performance of the DEMiner.

(a)

(b)

(c)

(d)
Based on the above results, some conclusions can be drawn. (1) The “DE” algorithm always falls into local optimum. (2) The “DE + Random Search” and the “GA” can discover process models with similar quality, but the former has faster convergence speed than the latter. Note that both of the two algorithms use random search (i.e., the genetic mutation). The difference is that the “DE + Random Search” employs the DE algorithm. It proves that the DE algorithm can quickly approximate the optimal solution and accelerate the convergence speed. (3) Based on the experimental results of the “DE + Random Search” and the “DE + Guided Search,” it can be seen that the latter achieves much better results than the former. This explains that the guided local exploration is efficient to help the DE algorithm skip out the local optimum and improve the searching ability of the DEMiner.
4.3. Performance on Artificial Event Logs
4.3.1. Setup
This section compares the performance of the DEMiner with three popular process mining algorithms. Through the experiments, I want to evaluate the performance as well as the antinoise ability of the DEMiner. The selected process mining algorithms for comparison include Heuristics Miner (HM) [20], ILP Miner [5], and ETM Miner [12]. Among these algorithms, HM is a popular tool of process mining, which outputs a heuristic net as a mining result, and ILP Miner as well as ETM are two state-of-the-art algorithms in the field of process mining. ProM 6.9 which is the most popular process mining platform was used in the experiments [21]. Parameters of the three algorithms were set to default. Because ILP Miner and ETM output a Petri net and a process tree as a mining result, respectively, the obtained models must be converted to causal matrices based on Definition 4. By this way, the four algorithms can be evaluated by a unified way. Specially, if the outputting model contains invisible transitions (e.g., the ETM), it would be converted to causal matrices by hand.
Four metrics defined in [8, 19] were used to evaluate the algorithms in the experiments. The metrics include behavioral precision (Bp), behavioral recall (Br), structural precision (Sp), and structural recall (Sr). The Bp and Br are based on the parsing of an event log by the mined model and the original model. The former detects how much behavior is allowed by the mined model that is not by the original model. The latter detects for the opposite. Moreover, the closer the values of Bp and Br to 1.0, the higher the similarity between the original and the mined models. The Sp and Sr metrics are based on the causality relations of the mined and original models, the former detects how many causality relations the mined model has that are not in the original model, and the latter detects for the opposite. Different from the Bp and Br, the Sp and Sr measure the similarity from the structural point of view. When the original model has connections that do not appear in the mined model, Sr will take a value smaller than 1, and, in the same way, when the mined model has connections that do not appear in the original model, Sp will take a value lower than 1.0.
4.3.2. Noise-Free Event Logs
First of all, the experiments were performed on 22 noise-free event logs. The results are listed in Table 5, in which the best results on each log are in italics. From the results, it is easy to find that the performance of the DEMiner is slightly better than other three algorithms. The DEMiner achieves the optimal solutions (i.e., four metrics are equal to 1) on 18 event logs. The HM, ILP Miner, and ETM achieve the optimal solutions on 16 event logs, 14 event logs, and 5 event logs, respectively. To compare the four algorithms more intuitively, a combinatorial metric, called average f-score, is designed, which is shown as follows:
The results are shown in Figure 5. From the figure, it is easy to find that the DEMiner only lost to other algorithms on four event logs, which are “a5,” “a6nfc,” “a7,” and “h6p18.” Except that, the DEMiner obtained the best results on the remaining 18 event logs. Moreover, the average f-score achieved by the DEMiner on the rest 4 event logs is over 0.9. It demonstrates that the DEMiner has good performance. Later, a deep analysis is given.

There are four event logs that the DEMiner could not achieve the best results, which are “a5,” “a7,” “h618,” and “h6p36.” The mining results which were repeated most often are shown in Figure 6. In the figure, the incorrect parts have been labeled by red color. Moreover, the dotted lines denote the missing arcs (a.k.a in the original model but is not discovered by the DEMiner), and the solid line denotes the incorrect connections (a.k.a. structural errors).

(a)

(b)

(c)

(d)
In Figure 6(a), it can be seen that the mined model lacks a cycle <E, E>. In the original model, there are two cycles on the node “E.” The reason behind this phenomenon is that the operators of differential mutation proposed in this work (Definitions 5 and 6) will remove such structure during the evolution. Assume S1 = {{E}, {E}, {B, C}} and S2 = {{E}, {B}}, then S1-S2 = {{B, C}}; in other words, the two subsets {E} would be removed from S1 by the minus operation. Furthermore, assume S1 = {{E}, {E}, {B, C}} and S2 = {{E}, {B, C}}; it is easy to find that the two sets will obtain the same completeness value, but the precision value of S1 is lower than that of S2 (the former has more enabled activities). In other words, S1 would be replaced by S2 during the evolution. Similarly, this phenomenon also appears in the event log “h6p18.”
Next, in the mined model of “a7,” the input set of node “D” is {{2, 4},{3, 7, 8}}, which is {{2}, {3, 7},{4, 7, 8}} in the original model. The reason behind this incorrectness is also the differential mutation operations. It is easy to find that removing the intersection part of two sets is a high probability event in the proposed algorithm. Assume two sets S1 = {3, 7} and S2 = {4, 7, 8}, S1 + S2 = {{3, 7}, {8}} in terms of Definition 6. It can be seen that the intersection of two sets is removed. From above analysis, it can be found that the proposed method could not achieve the best results in some rare cases (e.g., two circles on a same node and existing redundant structures). However, from another perspective, it shows that the proposed method prefer the model with low structural complexity.
For “h6p36,” the four algorithms discovered the same model shown in Figure 6(d). The mined model lacks two arcs <KB, NB> and <KA, NA>, which exist in the original model (heuristic net) [19]. Through analysis, I find the original heuristic net is incorrect. From the CPN model of “h6p36,” it can be seen that the model just has two parallel paths starting with “KA” and “KB,” respectively. However, the giving heuristic net has two arcs, i.e., <KA, NA> and <KB, NB>, which may lead to two nonexistent paths {Start, KA, NA, End} and <Start, KB, NB, End>. Therefore, the mining results of four algorithms can perfectly fit to the event log in fact.
4.3.3. Noisy Event Logs
Next, the experiments were performed on 22 event logs with 5% noise. The experimental results are shown in Table 6. The performance of the HM degrades significantly. Noise also affects other two algorithms, but their performance degradation is smaller than the HM algorithm. Moreover, the ETM can also discover the optimal process models on “paral5,” which did not achieve the best result before. Similarly, the average f-score of the four algorithms are calculated (see Figure 7). It can be seen from the figure that the performance of DEMiner slightly degrades and it achieves the best results on 13 event logs. Moreover, the average f-score of the DEMiner is between 0.8 and 1.0.

Later, the experiments were performed on the event logs with 10% noise. The experimental results are listed in Table 7, and the average f-score of the four algorithms is shown in Figure 8. From the table, we can see that the DEMiner achieves the best results on 14 event logs, and the other three algorithms (from left to right) achieves the best models on 2, 7, and 4 event logs, respectively. We can see that the ETM discovers the optimal model on “a7” and “a8,” which contain 10% noise, but it does not find the optimal model on the same logs that with 5% noise. This is because the two logs are independent; i.e., the inserted noise may be totally different. Through careful comparison of Figures 7 and 8, it can be found that the performance of the DEMiner dose not degrade significantly, and it keeps in a stable level. Based on the experiments on two groups of event logs with noise, a conclusion can be drawn that the DEMiner has good antinoise ability. However, we should notice that the DEMiner cannot discover the optimal solutions on all event logs with noise. This phenomenon demonstrates that the DEMiner cannot yet avoid noise interference.

4.4. Performance on Real Event Logs
This section shows the performance of the DEMiner on two real event logs, i.e., “BPI2013cp” and “Sepsis.” In the experiments, a “start” event as well as an “end” event was added to each trace at the running time. It ensures that each path has the same “start” node and the “end” node. To eliminate the randomness of evolution, the DEMiner was executed 10 times on each event log. The processes of evolution on two event logs are shown in Figures 9 and 10, respectively. The ordinate denotes the completeness metric, and the abscissa denotes the number of generations, which is scaled from 1 to 100. Here, I only record the completeness value because it is monotonous. On the contrary, the precision value may fluctuate greatly along with the change in the completeness value. From the two figures, it can be seen that the DEMiner converges fast on two event logs.


Furthermore, three metrics were employed to evaluate the efficiency of the DEMiner, which are alignment-based fitness [22], alignment-based precision [23], and the combined f-score [24]. The calculating results, including the average value and the standard deviation, are listed in Table 8. Compared with the results given in [24], it is easy to find that the results obtained by the DEMiner is slightly lower than the result obtained by the ETM on “Sepsis,” and the results obtained by the DEMiner is better than the result obtained by the ETM on “BPI2013cp”. This proves that the DEMiner can play well on the real event logs.
5. Conclusions
This paper proposes a new process mining algorithm, named DEMiner. The proposed algorithm is based on a hybrid evolutionary strategy, which consists of a set-based DE algorithm and a guided local exploration algorithm. Meanwhile, some techniques are employed to improve the efficiency of the DEMiner, such as gene bank, taboo list, and consistence repair. To evaluate the performance, 68 event logs were used in the experiments. Some conclusions can be drawn based on the experimental results:(1)Through comparison of the four different strategies (i.e., DE, DE + Random Search, DE + Guided Search, and GA), the “DE + Guided Search” outperforms the rest strategies and it can achieve the best quality solution as well as the fastest convergence speed. Moreover, the results prove that the DE algorithm can rapidly approximate the optimal solution, but it always suffers from premature convergence. The guided local exploration can help the DE algorithm skip out the local optimum and improve the efficiency of the proposed algorithm.(2)Through comparing the performance of DEMiner with three popular process mining algorithms (i.e., HM, ILP Miner, and ETM) on 22 noise-free event logs and 44 noisy event logs, it shows that the DEMiner can achieve the best result on most of the event logs. Furthermore, based on the experimental results on 2 real event logs, it can be concluded that the DEMiner can work well on real-world events logs. This proves the effectiveness and the efficiency of the proposed algorithm.
However, we can see that the DEMiner did not successfully discover an optimal process model from any one of the 44 noisy event logs. This demonstrates the drawback of the DEMiner. Besides that, it is hard for the DEMiner to discover some rare structures, such as two circles on a node. This is the future work for us.
Data Availability
The event logs used to support the findings of this study are included within the article, which can be downloaded from https://svn.win.tue.nl/repos/prom/DataSets/GeneticMinerLogs/. The corresponding process model can be found in [19].
Conflicts of Interest
The author declares that there are no conflicts of interest.
Acknowledgments
This work was supported by the Project of Science and Technology Bureau of Leshan (Grant no. 18JZD117), the Scientific Research Fund of Leshan Normal University (Grant no. ZZ201822), the Key Projects of Sichuan Provincial Education Department of China (Grant no. 18ZA0239), and the Project of Key Lab of Internet Natural Language Processing of Sichuan Provincial Education Department (no. INLP201903).