Abstract
There are many factors involved in the assembly process of complex products, but only a few of the key influencing factors really affect the performance of the product assembly. In order to ensure the performance of complex products, it is necessary to model the deviation transfer flow of the assembly process of complex products and identify key processes and characteristics and determine its scope of influence. This paper establishes the set of assembly performance influencing factors by analyzing the complex product assembly process and uses complex networks combined with entropy weight method to identify the influencing factors of product performance to achieve the identification of key influencing factors. Based on the key factor identification results using an Apriori algorithm to mine the association rules between key factors and performance to provide guidance for the assembly of complex products and achieve transparent and controllable performance. The assembly process of a certain type of aeroengine low-pressure fan rotor is verified as the research object. The results show that the proposed method can identify the key factors in the multistage assembly process and establish an association rule library to provide decision support for process adjustment.
1. Introduction
The assembly of complex products requires multiple stages of assembly, and there is a certain amount of assembly deviation on each stage. The errors in each assembly process are different in size and type (part manufacturing errors, positioning errors during assembly). Various errors are continuously transmitted, accumulated, and evolved with the advancement of assembly, resulting in the final assembly performance not meeting the requirements. By modeling the deviation flow transfer of the assembly process of complex products, we identify the key influencing factors and determine their influence range, which is a guide to adjust the influencing factors of the assembly performance and ensure the product assembly performance.
The assembly process of complex products includes not only multiple types of components but also the use of various types of equipment such as tools and fixtures. And the assembly process is more complex; assembly process chain is long. These factors are constantly changing and coupled with each other throughout the assembly process, which leads to the following difficulties in identifying and correlating key factors of complex products in the assembly process. (1)There are many influencing factors, the factors are coupled, and the influencing mechanism is complex and unclear. Product assembly performance is affected by the quality of parts processing, assembly process, assembly quality, and other factors, involving many influencing factors. These influencing factors are dynamically changing and coupled with each other, and it is difficult to determine their influence range by mechanism analysis methods under the synergistic effect of multiple factors(2)The accumulation of influencing factors has a cross-process effect. The assembly process of complex products is generally divided into multiple processes. The errors generated by the current process will be passed to the next process, which has a certain cumulative effect
To address the above difficulties of key factor identification and association analysis mining of complex products, this paper proposes a key factor identification and association mining method based on the combination of complex network, entropy method, and Apriori method. Establish the set of assembly performance influencing factors by analyzing the complex product assembly process, and establish the key factor identification model by using complex network combined with entropy power method to identify the influencing factors of product performance and obtain the key influencing factors in the complex product assembly process. Based on the key factor identification results using the Apriori algorithm, we mine the association rules between key factors and performance, to achieve accurate regulation of the assembly process and provide decision support for subsequent assembly process adjustment.
This paper is divided into 6 subsections. The first part is the introduction, which introduces the complexity and difficulty of the research problem. The second part is a literature review to introduce the current research progress in key factor identification and association mining. The third part is a complex product key factor identification and association mining method, which introduces the details of the method proposed in this paper. The fourth part is an example analysis to validate the application of the method proposed in this paper in the low pressure rotor assembly process of an aeroengine. The fifth part is the conclusion. The sixth part is the references.
2. Literature Review
2.1. Key Factor Identification
In order to achieve efficient control of product performance, it is necessary to identify key influencing factors. At present, the commonly used methods of identifying key factors mainly include loss function method, risk analysis method, fuzzy theory, Bayesian network, complex network, and other methods.
Tang et al. [1] used the assembly directed graph to establish a candidate set of key characteristics and then calculated the influence degree of the candidate characteristics on the key characteristics of the upper layer based on the Taguchi quality loss method and then defined the key characteristics of the layer according to the degree of influence. Zhao et al. [2] based on the qualitative preidentification of key characteristics based on the cumulative relationship of error transmission and realized the identification of key characteristics through risk analysis. Xu et al. [3] studied the quality factor identification method based on Bayesian network. This method uses Bayesian inference to identify the key quality factors based on the product quality relationship of the Bayesian network. Whitney [4] analyzed the influencing factors of key characteristics through the combination of characteristic decomposition, tolerance analysis, and assembly sequence. Guo [5] uses Taguchi quality loss function to determine the quality loss caused by the coordination elements between assembly levels and then uses fuzzy theory to calculate the influence degree of each element to realize the identification of product elements. Raman et al. [6] used hypergraphs and rough sets to reduce attributes to form an optimal attribute subset and realize the selection of important attributes. Yang [7] and Zheng et al. [8] mentioned key quality characteristic identification methods based on risk analysis, quality loss function, principal component analysis, and historical data analysis. Zhong et al. [9] proposed a multiattribute fusion method that combines the topological attributes and diffusion attributes of nodes to adaptively obtain the ranking results, including two fusion methods based on attribute union (FU) and attribute-based ranking (FR). Based on principal component analysis (PCA), Jin et al. [10] proposed a new algorithm for calculating the importance of nodes in complex networks, combining attributes such as degree centrality, tight centrality, and eigenvector centrality. Zhao et al. [11] used the TOPSIS method to integrate degree centrality, mesoscopic centrality, aggregation coefficient, and proximity centrality to rank the importance of nodes and then delete the most important nodes to obtain the ranking results of node importance.
The above methods have achieved some results in the identification of key characteristics, but their identification mainly depends on the decomposition and transmission of various factors. The relationship between various factors is relatively clear, or more a priori knowledge is required to realize the identification of key factors. It is difficult to identify key characteristics for the problems of unclear action relationship and small number of factors. On the basis of clarifying nodes and node relationships, this paper realizes the identification of key nodes in complex networks by constructing an association relationship network between various characteristics. The identification process does not require experimental data, nor does it need to clarify the mechanism of action and transfer relationship among various factors.
2.2. Association Mining
The increase in the number of the quality of complex products and process feature quantities will increase the complexity of the association rule algorithm. In fact, the performance of complex products may only have a direct relationship with certain feature quantities, but the relationship with other feature quantities is not obvious. Commonly used association rule analysis algorithms include Apriori algorithm [12] and FP tree frequent itemset algorithm [13]. Guo et al. [14] introduced cluster analysis theory, modeled photovoltaic processing from the perspective of data mining, and used it to evaluate the reliability of photovoltaic power generation systems. Li et al. [15] proposed a new network loss assessment method based on hybrid cluster analysis using data mining and typical scenario simulation ideas. Li et al. [16] developed a set of equipment fault information management and analysis system using data mining technology to provide a basis for realizing the status maintenance of relay protection devices and provide decision support for the analysis and processing of power grid faults. Zhang et al. [17] proposed a data mining and analysis method for secondary equipment defects based on the Apriori algorithm based on the defect data of secondary equipment, which improved the operation, maintenance, and control level of secondary equipment in the power system. Chen and Cao [18] proposed a multilayer association rule data mining algorithm and applied it to the analysis of massive data in commercial banking systems. Jia et al. [19] used the FP-Growth association rule mining algorithm to propose an association analysis for bird strikes in the aviation field and find out the association rules between the key inducing factors of civil aviation bird strikes.
The above-mentioned research has conducted certain research on the mining of association relations, but the above methods are all for the mining of association relations for discrete text data and cannot be mined for the multidimensional continuous data existing in the assembly process of complex products. Therefore, based on the identification of key factors, this paper uses the discretization algorithm to discretize the multidimensional continuous data, uses the association rule analysis method to mine the out of tolerance information data of complex product performance, analyzes the reliability between product performance and feature quantity, and reveals the correlation between product performance and feature. And one more advantage after discretization is that the distribution intervals of features can be obtained, and in the obtained association rules, it is a combination of multiple feature value intervals, which can provide clear data support for optimization adjustment.
3. Key Factor Identification and Association Mining of Complex Products
Firstly, this paper identifies all the factors affecting assembly performance in the assembly process of complex products, classifies the influencing factors, and establishes a product performance influencing factor index system. Then abstract the influencing factors as nodes and the assembly relationships as edges to build a complex network model. The properties of all factors are calculated based on the constructed complex network, and the Entropy weight-TOPSIS model is used to sort the influencing factors, in order to realize the identification of the key factors. Finally, based on the key factor identification results, the relevant data of the key factors are collected, and the collected data are discretized. According to the discretization results of the collected data, the Apriori algorithm is used to mine the relationship between assembly influencing factors and assembly performance and determine the range of influence of key factors so as to provide for subsequent process adjustments, as shown in Figure 1.

3.1. Identification of Key Influencing Factors Based on Complex Network and Entropy Weight Method
Complex products have complex assembly relationships due to the complexity of the structure and the number of parts involved in the assembly. In the multistage assembly process of complex products, the machining quality of the parts themselves will affect the assembly performance of the products, and the assembly process and assembly quality will also affect the assembly performance of the products. In the assembly process, the assembly errors caused by the part processing quality, assembly process, and assembly quality are continuously transferred and accumulated among the parts, and there are also coupling relationships among the influencing factors. Therefore, the coupling relationship of complex product assembly process can be modeled and analyzed by means of complex network directed graph. The quality and process factors in the assembly process are abstracted as the nodes of a complex network, and the interactions between the influencing factors are abstracted as the edges of the complex network, and the connection relationships are determined by the assembly sequence.
The set of influencing factors is the basis for building complex network models and analyzing key influencing factors. Therefore, it is necessary to clarify the influencing factors affecting the assembly performance from the assembly process of complex products and form a set of influencing factors for the assembly performance of complex products to provide a data basis for building complex networks. The assembly performance of complex products is influenced by the quality of parts processing, assembly process, and assembly quality, so it can be analyzed from these three aspects to form a set of assembly performance influencing factors, which is noted as .
In the formula, is the set of influencing factors. is the processing quality influencing factor. is the assembly process influencing factor. is the assembly quality influencing factor. means there are processes.
The network relationship mapping between the factors is established according to the assembly order, as shown in Figure 2.

Complex networks, as a branch of complex system theory, are topological abstractions of real complex systems, and the theory and methods of complex networks can well describe the topological characteristics, functional properties, and interrelationships of systems using complex networks . Based on the theory and method of complex network, we construct a network model of association relationship reflecting the characteristics of the system and build a node importance evaluation index system with the node attributes of complex network.
The different connection relationships of the nodes in a complex network can make the node importance inconsistent, where important nodes are the special nodes in a complex network that affect the structure and function of the network [20]. The connection relationship of the nodes in the complex network reflects the inherent attribute information of the nodes. This paper selects seven network characteristics of each influencing factor to construct an evaluation index system for the importance of each process influencing factor. These network characteristics are degree centrality, aggregation coefficient, mesoscopic centrality, proximity centrality, centrifugal centrality, eigenvector centrality, and average neighbor degree. Table 1 shows the commonly used calculation methods for characteristic analysis in complex networks [21].
The complex network model of factors influencing the assembly performance of complex products is constructed in the following steps. (Step 1)According to the analysis of complex product assembly performance influencing factors, the influencing factors are used as nodes of the complex network model. represents the set of influencing factors for each process, where represents the -th node of the -th process complex network model. is the number of processes. is the number of influencing factors of the -th process.(Step 2)Edge of complex network model based on the interaction influence relationship among the influencing factors. represents the set of influence relationships between the influencing factors of each process. Where represents the existence of an interaction between the -th node and the -th node in the -th process.(Step 3)Draw complex network diagrams. Form a connection matrix based on the above two steps, and draw a complex network diagram based on the connection matrix.(Step 4)The inherent properties of each node are calculated. Calculate the local and global information of each node as described in Table 1.
After the complex network model is built, the key nodes in it need to be identified. More research has been conducted on the identification of important nodes in complex networks, mainly including degree centrality, feature vector centrality, mediator centrality, and Paerank method [22]. Each of the above algorithms only evaluates the node importance from one aspect and has some limitations. Therefore, this paper adopts the entropy-weight-TOPSIS model to evaluate the node importance by considering the local attribute information as well as the global attribute information of the network and then identifies the important nodes of the complex network.
The entropy-TOPSIS model is a combination of the entropy and TOPSIS methods [23]. The TOPSIS method is a multi-indicator evaluation algorithm, which is a kind of ranking by judging the distance of each target from the ideal target. Since TOPSIS will involve multiple indicators in the operation, the weight of each indicator is not consistent; if the weight of each indicator is not determined, it will affect the accuracy of the ranking results. As an objective assignment method, the entropy method does not depend on human experience, so the entropy method is combined with TOPSIS method to achieve the identification of important nodes in complex networks. The specific steps for the identification of key influencing factors of product performance based on the entropy-weight-TOPSIS model are as follows: (Step 1)The inherent properties of each influencing factor are calculated, and the corresponding primitive matrix is constructed based on the established complex network. Suppose with indicators , where . Indicator has sets of data. That is, there are evaluation objects and evaluation indicators. The original data matrix is as follows:(Step 2)In data standardization, since the different meanings of the indicators will make the difference in the scale of each indicator and affect the evaluation results, the original data matrix needs to be standardized. There are positive and negative indicators in the standardization process. A positive indicator means that the larger the value, the more important the indicator is, and a negative indicator means that the lower the value, the more important the indicator is, so different methods are used to standardize data for different indicators. Where degree centrality, aggregation coefficient, mediator centrality, proximity centrality, eigenvector centrality, and average neighborliness are all positive indicators, so they are normalized according to equation (6). Centrifugal centrality is a negative indicator, normalized according to equation (7). The matrix , after the standardization of each index is obtained, as shown in equation (8).(Step 3)Calculate the information entropy of each index. Calculate the information entropy of each index of different processes according to the formula of information entropy
In the formula, . If , then define . (Step 4)Calculate the weight of each indicator. Calculate the weight of each index for each process according to the index weight calculation formula (10), .(Step 5)Determine the positive ideal solution and the negative ideal solution. Based on the standardization of node attributes, the maximum value in each indicator data is the positive ideal solution , and the minimum value in each indicator data is the negative ideal solution . The details are as follows:
In the formula, . (Step 6)Calculate the distance of each node from the ideal solution. The distance and of each node evaluation index from positive ideal scenario and negative ideal scenario is defined as follows:(Step 7)Calculate the comprehensive evaluation index of each assessment object. Calculate the comprehensive evaluation index based on the distance of each evaluation object from the positive and negative ideal solutions. The calculation formula is as follows:
According to the comprehensive evaluation index of each influencing factor, the change curve of the importance of influencing factors is drawn and the key factors are selected. When there is a significant decrease in the weight, it indicates that there is a significant decrease in the importance of the later influencing factors relative to the preceding influencing factors. Therefore, when there is a significant decrease in the weight, the weight is used as the threshold for selecting the key influencing factors, and the factors greater than this threshold are the key influencing factors for performance.
3.2. Association Relationship Mining Based on Apriori Algorithm
In the research content of the previous section, the factors affecting the performance of complex products have been identified and the key factors have been extracted. In this part of the content, this paper proposes to use the Apriori method to analyze the credibility between the assembly performance of complex products and key influencing factors, reveal the correlation degree between the assembly performance and key influencing factors, determine the influence range of key factors, and form a correlation rule base to provide guidance for subsequent process optimization.
The association rule mining problem can be formally described as follows. Let be the set of all influencing factors. is the set of all combinations of influencing factors. Each combination of influences is the set of some range of values of influences. is contained in . Each combination of impact factors can be identified by a unique identifier TID. Let be the set of certain influencing factors. A combination of influences is said to contain if . The association rule is expressed in the following form: implication of . Here, , , and . The rule in the set of the combination of influencing factors is bounded by the degree of support and the degree of confidence . The degree of confidence indicates the strength of the rule, and the degree of support indicates the frequency of occurrence in the rule. The support of the data item set is the ratio of the number of combinations of influences containing in to the total number of combinations of influences in . The support of rule is defined as the proportion of the combination of influences containing in as , indicating the ratio of the number of combinations of influences containing both to the total number of combinations of influences in . The support of rule is defined as the degree to which of the combination of influences in that contains also contains , indicating how likely it is that the combination of influences in that contains contains .
Association rule mining is to find association rules with user-given minimum support minsup and minimum confidence minconf in the database D of influencing factor combinations. The association rule mining problem can be decomposed into two steps. (1)Find all the itemsets in the influence factor combination database that are greater than or equal to the user-specified minimum support. The set of combinations of influences with minimal support is called the set of frequent items(2)Generate the required association rules using frequent item sets. For each frequent itemset , find all nonempty subsets of . If the ratio , generate association rule . is the confidence of association rule
The Apriori algorithm is a classical algorithm for mining frequent itemsets of association rules. The algorithm uses an iterative method of layer-by-layer search to perform multiple scans of a combined database of influencing factors. First scan yields frequent 1-term set . The result of the scan is used before the th scan (i.e., the frequent itemset) to generate the candidate -item set . The support of the elements in is then determined during the scan. Finally, at the end of each scan, the frequent -item set is calculated. The algorithm ends when the candidate frequent -item set is empty. In this paper, the influencing factors are combined with variables corresponding to the performance of complex products to form a data item set in association rule analysis.
The following correlation analysis of complex product performance and influencing factors is to analyze the correlation between the influencing factor set and the performance set . The credibility of the influencing factors is calculated to find .
However, association rule analysis algorithms are all for discrete data, and many factors in are continuous values. Therefore, it is necessary to perform data preprocessing on the data of each influencing factor. Before mining association rules, all data is discretized, and continuous values are mapped into multiple discrete values.
There are many discretization methods for continuous values. Typical discretization algorithms are as follows: equal width or equal frequency method [13], C4.5 method [24], entropy method [25], and Chi-Merge algorithm [26]. This paper uses the Chi-Merge algorithm to discretize the data.
The Chi-Merge algorithm is a supervised discretization method based on the Chi-square distribution (represented by the symbol ). Using a bottom-up strategy, the best neighboring interval is found recursively, and then they are merged to form a larger interval. The process is as follows: (1)Sorting the data in ascending order(2)Defining the initial interval so that each data is in a separate interval(3)Repeating until the of any two adjacent intervals is not less than the threshold determined by the specified confidence level
After Chi-Merge discretization preprocessing, the influence factor database with continuous values can be transformed into the influence factor database of Boolean type. The form of the data item set is as follows:
In the formula, denotes the database of influencing factors after Chi-Merge discretization. denotes the mapping value of each Chi-Merge discretized interval, as shown in Table 2. in Table 2 indicates the mapping result of Chi-Merge discretization of . denotes the value of the interval mapping after discretization of the th feature . denotes the th feature discretized into interval mapping values.
After Chi-Merge discretization, all continuous attributes are discretized into discrete values and participate in the subsequent association rule mining as an item set.
Apriori association rule mining algorithm trial calculation process is as follows. Suppose there is a database with 5 transaction records, as shown in Table 3. Assume ; that is, there are 5 eigenvolumes, each discretized into 3 intervals. The minimum support is set to. (i)Database (ii)Scan database to obtain the frequent item set , as shown in the following Table 4(iii)Remove the set of items smaller than the minimum expenditure degree minsup to obtain the frequent item set , as shown in Table 5(iv)Scan frequent item set to obtain the frequent item set , as shown in Table 6(v)Remove the set of items smaller than the minimum expenditure degree minsup to obtain the frequent item set , as shown in Table 7(vi)Scan frequent item set to obtain the frequent item set , as shown in Table 8(vii)Remove the set of items smaller than the minimum expenditure degree minsup to obtain the frequent item set , as shown in Table 9(viii)Scan frequent item set to obtain the frequent item set , as shown in Table 10(ix)Remove the set of items smaller than the minimum expenditure degree minsup to obtain the frequent item set , as shown in Table 11
When the frequent item set is scanned again by the Apriori algorithm, the frequent item set is empty and the algorithm trial process is ended. The association rules and their support and confidence levels are obtained through algorithmic trial calculations. By setting the minimum support and confidence level and retaining the association rules that meet the requirements, a library of association rules is obtained for guiding the plant to make process adjustments.
4. Case Analysis
This paper analyzes an example of a certain assembly process of a certain type of aeroengine low-pressure fan rotor. The optional influencing factors are processing quality, assembly process, and assembly quality, a total of 49 items, and the performance index is the unbalance of the fan rotor.
This article uses Python’s open source software library NetworkX to model complex networks. The library contains visualization and analysis algorithms for complex networks, which can visualize complex networks and analyze data. Taking the assembly of the primary and secondary discs of the low pressure fan rotor as an example, the set of influencing factors is shown in Table 12.
The complex network visualization model of the correlation model of the low-pressure rotor unbalance influencing factors is shown in Figure 3.

For the above established correlation relationship model of each process unbalance influence factors, the attribute information of each node is calculated and the results are shown in Table 13.
According to the calculation formulas (10), (11), and (13), the calculation results are shown in Tables 14 and 15, and the weight of each influencing factor is shown in Table 16.
It can be seen from the curve in Figure 4 that the threshold of the disc assembly process is 1.760, and the key influencing factor identification results are shown in Table 17.

As shown in Table 18, this paper extracts 30 key influencing factors related to the imbalance out-of-tolerance according to the key factor identification in the previous part, and the corresponding variables are . According to the definition of association rules, they are all important parameters that characterize the performance of low-pressure rotors, and they are all related to each other.
A total of 150 sets of data with out-of-tolerance imbalances have been collected in this paper. The results of using the Chi-Merge discretization algorithm to discretize the continuous data are shown in Table 19.
Taking the discrete data as a new item set, using Apriori association rule analysis, some more detailed association rules can be obtained, and these rules can also be used for process adjustment in the assembly process. Table 20 is a part of the association rules with higher credibility after association rule mining.
Obviously, the obtained association rules with higher credibility can be used to make a preliminary judgment on whether the imbalance is out of tolerance and to provide decision support for process adjustment based on this, so as to improve the success rate of one-time assembly of the low-pressure rotor.
5. Conclusion
In this paper, we propose a key influence factor identification method based on the combination of complex network and entropy power method to achieve the identification of key influence factors for complex product assembly performance and the scope of influence. Based on the key factor identification results, the Apriori algorithm is used to mine the association rules between key influencing factors and the assembly performance of complex products. Several association rules have been calculated and analyzed through examples, and a library of association rules has been formed, which can provide decision support for process adjustment and has been used with good results in the actual application in aeroengine assembly plants. The method in this paper is not limited to the adjustment and optimization of the assembly performance of complex products; it can also be combined with MEMS [27], mobile robots [28], and quadrotors [29] in the subsequent research process to better optimize their control by identifying key factors and mining association rules. However, when the Apriori algorithm is used for association rule mining, the database needs to be scanned every time the set of candidate items is generated, and the size of the database used for association rule mining is usually relatively large, which makes the algorithm inefficient. The efficiency of the algorithm will be improved by improving the Apriori algorithm during the subsequent research.
Data Availability
Some or all data, models, or code generated or used during the study are proprietary or confidential in nature and may only be provided with restrictions.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was funded by the National Key R&D Program of China (No. 2019YFB1703802).