Abstract
Urban rail transit project is a kind of typical operating public project. Adopting the PPP model can alleviate local financial pressure and improve capital use efficiency. However, many existing urban rail transit PPP projects have fallen into the dilemma of cost overrun, schedule delay, and poor product quality. The lack of reasonable project performance evaluation is considered as an important cause. This research first clarifies the meaning and characteristics of project performance evaluation by comparing and analyzing several basic concepts and relationships. Secondly, an operation performance evaluation system based on urban rail transit PPP project of a three party is constructed from a multistakeholder perspective. Finally, the best worst method and large-scale group evaluation technology are used based on the comparison of multistakeholder evaluation application scenarios and evaluation methods. A quantitative model is constructed to evaluate the operating performance of urban rail transit PPP projects and is tested and explained by a specific case study. Most current studies generally focus on the earlier stage of project performance, and this article mainly discusses and researches the operation performance of PPP projects. There, suggestions are provided for the operation performance evaluation theory and urban rail transit PPP project practices. This article focuses on the operation performance evaluation of PPP projects.
1. Introduction
With the China’s economic development and the acceleration of urbanization, the number of urban populations and cities’ scale continues to expand. Citizens’ demand for urban transportation increases continuously, and the number of private cars in urban area rises significantly. However, the travel mode represented by private cars has caused many urban problems: traffic congestion, energy consumption, greenhouse gas emissions, and air pollution caused by automobile exhaust. These problems seriously threaten citizens’ living standards and the development sustainability of the city [1]. To solve these problems, the government hopes to reduce private car ownerships and replace private car travel by increasing the public transport supply. Urban rail transit has many advantages, such as green, efficient, convenient, large volume, vital accessibility, and optimization of urban spatial structure. Public transportation is widely favored by the government and society, and it is regarded as an essential measure to solve urban traffic problems [2].
As a typical type of public transportation infrastructure, urban rail transit projects have an enormous initial investment demand, which is a massive debt and is difficult for local governments to bear within a short time. From the perspective of economic attributes, urban rail transit projects are quasipublic products. The application of the Public-Private Partnership (PPP) in the construction and operation of urban rail transit projects can effectively alleviate the situation of lacking local government infrastructure funds. It can also enhance the improvement of project management philosophy, technical capability, and operational efficiency [3]. However, many ongoing infrastructure PPP projects worldwide have encountered several difficulties, such as construction delays, cost overruns, corruption, poor operating performance, and low public satisfaction. The performance of these PPP projects is far behind expectations [4, 5]. Studies have shown that among the many factors affecting the success of infrastructure PPP projects [6], insufficient project performance evaluation is one of the key factors with great importance [7]. The existing performance evaluation tools of PPP projects generally have various shortcomings, such as limited evaluation dimension of performance indicators and ignoring operation and maintenance period performance. These problems lead to low public product supply and public resource utilization efficiency in PPP infrastructure projects [8].
Urban rail transit PPP projects have a considerable investment scale and complex stakeholder relationships, whose success or failure has a profound impact on both the society and economy. Hence, strengthening the performance management of urban rail transit PPP projects is of great significance. It guarantees to achieve project goals, ensures the rational allocation and full utilization of resources, stimulates the vitality of social capital, and promotes social and economic development. The traditional project performance evaluation method based on the “period cost quality” has shown many shortcomings when measuring PPP projects’ performance: (1) In terms of content evaluation, the evaluation dimension is single and unable to reflect the numerous key success factors of infrastructure PPP projects. For instance, the VfM (Value for Money) method is mainly based on financial evaluation, which ignores operating performance from the perspectives of stakeholders. It is challenging to quantify PPP projects’ external influence within VfM. (2) In terms of time evaluation, existing researches cannot fully reflect the dynamic changes of the project’s external environment and the project performance requirements in different periods (especially for the operation period). (3) In terms of the function evaluation, the evaluation results only reflect the insufficiency of the project output. It cannot correctly and comprehensively reflect the project’s specific problems, which allows further targeted and operable improvement measures. Hence, this research first summarizes relevant theories of PPP project performance evaluation. Building on stakeholders’ perspective, this research adopts the best worst method and large-scale group evaluation technology to construct an urban rail transit PPP project’s operation performance evaluation system. A quantitative evaluation model of the large-scale group is proposed and tested through a practical case. From a performance perspective, this article attributes the theoretical basis for achieving a win-win situation within multistakeholders and sustainable development of rail transit PPP projects and provides a quantitative scientific basis for the government to supervise urban rail transit PPP projects.
The structure of the rest parts in this article is as follows. Section 2 provides a literature review for the relevant research. Operation performance evaluation system of urban rail transit PPP projects is constructed in Section 3. Section 4 introduces the theory of the best worst method and large-scale group evaluation technology. An empirical study based on Beijing Metro Line 4 is conducted in Section 5. Section 6 gives the conclusions.
2. Literature Review
2.1. Performance Evaluation of PPP Projects
Performance management is one of the important topics in the PPP project management field. With the widespread application of the PPP model, PPP project performance management has attracted significant attention in both academic and industry fields. The research on PPP projects’ performance was initially developed based on the performance management of the government, enterprises, and organizations, from which an independent theoretical system is gradually formed. Several methods have been widely implemented in PPP project performance, such as Balanced Score Card (BSC), the performance prism, benchmarking, Critical Success Factors (Critical Success Factors, CSFs), project success degree models, quality function deployment, and Key Performance Indicators (KPIs) [9–11]. Besides, mathematical methods, such as tomographic analysis, structural equation modeling, fuzzy comprehensive evaluation, and data envelopment analysis, have also been applied in performance evaluation researches [12, 13]. Initially, performance evaluation was only a simple postevaluation of the PPP project regarding time, cost, and quality. With the proposition and development of the Value for Money (VfM) concept, the evaluation of project performance extended into the time, input, output, production efficiency, economic benefits, and social and ecological environmental impacts during the project’s entire life cycle. The evaluation of PPP infrastructure projects began to focus on the project’s continuity and influence [14]. Liu et al. proposed a stage-oriented evaluation process that evaluates each stage’s KPIs [15]. Liu et al. summarized three evaluation stages: (1) initiation and planning, (2) procurement, and (3) partnership [16, 17]. Large-scale group decision making (LGDM) involved various participants from different subgroups, which conduct the evaluation comprehensive. As PPP projects involved multistakeholders in the construction and operation periods, LGDM is suitable for the operation performance evaluation of PPP projects [18].
Unlike the “temporary contract organization” perspective adopted by traditional project governance, PPP projects have a long construction and operation cycle. The long-term and dynamic partnership between public sectors and social capital is vital for PPP projects, and thus, evaluating interest requirements of multistakeholder and operation performance is necessary. However, existing performance evaluations for PPP projects in operation and maintenance period is insufficient.
2.2. Performance Evaluation Indicators of PPP Projects
A scientific, reasonable, and quantifiable performance evaluation indicator system is the essential foundation of performance evaluation. PPP project performance evaluation is achieved by comparing the actual performance of key performance indicators with appropriate performance standards expected by multistakeholders during the project execution period. The PPP project performance evaluation indicator system should reflect the project’s implementation and achievement objectives comprehensively and objectively to support the value control of the PPP project life cycle.
Currently, the principles of PPP project performance evaluation indicators are designed based on the effectiveness, efficiency, economy, fairness, and other indicators, combining with the project’s types and characteristics. Kagioglou et al. first introduced the tool balanced scorecard for organizational performance management into the construction industry [19]. Yuan et al. analyzed the critical success factors (CSFs) affecting PPP projects’ performance and designed a PPP infrastructure project performance system [20]. From multiple perspectives of different stakeholders, Goran et al. developed a two-level performance evaluation indicator system for PPP transportation projects based on CSF [21]. Yuan et al. comprehensively used CSF and KPI methods to extract key performance indicators and established a system for PPP projects based on project process [22]. Villalba-Romero and Liyanage compared the difference between Performance Measurement System (PMS) and KPI through case analyses, which measures PPP highway projects’ success, then selected strong indicators based on the analyses [13]. To reflect PPP project performance indicators comprehensively, existing studies have summarized and classified these indicators from different perspectives, including the four-quadrant classification method based on the balanced scorecard, three-stage classification based on the initial period, design and construction period, operation and maintenance period of PPP projects and indicator classification based on the difference of technical, economic, operation, and maintenance [16, 17]. Huang et al. evaluated the operation performance of urban rail transit projects based on entropy-TOPSIS model [23]. Liu et al. developed a performance evaluation system for urban rail transit PPP projects based on the balanced scorecard and PPP projects characteristics [24].
These classification methods reflect the characteristics of project performance indicators in different aspects. However, comprehensive applications of methods are limited. An integrated key performance indicator scheme and performance management system has become the trend of PPP project research, which incorporates the views of different stakeholders, extends to the project operation and maintenance period, and combines internal learning, improvement, and innovation of the project organization.
3. Operation Performance Evaluation System of Urban Rail Transit PPP Projects
3.1. Identification for the Performance Evaluation Subject of Urban Rail Transit PPP Projects
The urban rail transit PPP project is a kind of large-scale public infrastructure project. From the perspective of government functions, the public sector, as a provider of public products and services, should represent the public’s interests and be supervised from the public. Hence, introducing public participation into project performance evaluation reflects that the public sector respects public opinion and adheres to user demand orientation. Consequently, this improves the objectivity, scientificity, and legitimacy of the evaluation results [25]. Public participation is a planned and organized institutional arrangement, which refers to the communication among the government and development departments, and the public that can be achieved through specific forms and channels. Thus, the public can participate in public affairs management and decision-making process involving public interest to resolve the conflicts and maximize social welfare [26]. However, research shows that the content and priority of the project’s interest requirements from the public sector and the public are not entirely consistent [22]. It leads to the different definitions of project success, which is the fundamental source of PPP project performance management’s complexity, as shown in Table 1.
In summary, many stakeholders of urban rail transit PPP projects have apparently different expectations in the project performance, which have led to the ambiguity of project success criteria and project value evaluation criteria. Therefore, the clarification of each evaluation subject’s project performance and evaluation standards is the basis for establishing a project performance evaluation system. According to Mitchell’s stakeholder identification theory, this article divides the key stakeholders in the PPP projects into three categories: public sector, social capital, and the public from the perspectives of power, urgency, and legitimacy [27]. The public sector expects that the project can guarantee the value for money (VFM, value for money), which means investment efficiency and positive externality achievement. For social capital, the most critical project goals are profitability, sustainability, and owners and end-users satisfaction. The public expects that the process of the project should be open and transparent. They anticipate that the project should provide high-quality public products and services and achieve other positive externalities. Although theperformance goals of three parties seem different, due to the nature of the PPP project partnership, the three parties share a relatively broad consensus on the project’s success goals, such as establishing good public-private relationships and ensuring end-user satisfaction. However, the priorities of these indicators vary for different stakeholders [28]. Effective project performance management should coordinate the interests of all parties, reduce differences, and achieve a win-win situation for the multistakeholders [29].
3.2. Operation Performance Evaluation Indicator System of Urban Rail Transit PPP Projects
The mainstream PPP project operation performance evaluation method currently is based on the balanced scorecard and key success factors. However, the balanced scorecard only distinguishes between financial and nonfinancial perspectives and internal and external performance evaluation views. It cannot explain the performance indicators and the relationship between indicators from the perspective of project operation logistic. Moreover, there is a lack of explanation of the relationship between the key success factors, which include various uncontrollable environmental indicators of projects, such as the political and economic environment. Compared with the balanced scorecard and key success factors, the performance prism method emphasizes stakeholders’ status in performance evaluation, which reflects the logic of project operation and the interactive relationship between the project and stakeholders [30, 31]. The performance prism is a three-dimensional framework model, and its structure can be imaged as a triangular prism, as shown in Figure 1. The performance prism includes five interrelated aspects: (1) Satisfaction of stakeholders: to maintain the relationship with stakeholders, it is necessary to clearly understand the different interest needs of various stakeholders, such as the project company’s technology, management experience, profitability, and end-user satisfaction. (2) Contributions of stakeholders; there is a mutually beneficial relationship among various stakeholders and the project company. Stakeholders obtain certain benefits from the project company and are required to contribute to the project company. For example, the project company hopes to receive supervision and support from the public sector, and the trust and cooperation from stakeholders. (3) Developing a strategy: for the project company, no strategy, or unreasonable strategy will lead to many problems such as blind expansion, weak competition, poor execution, and unsustainable growth. Therefore, considering the internal and external factors of the project company, it is particularly important to develop a value-for-money analysis of the project’s whole life cycle. (4) Implementation process: for project company, the process of strategy implementation is the most important driving factor to improve the performance of the project company. It determines the quality, efficiency, cycle, and cost of products and services. (5) Capacities: it mainly reflects the capacities to operate these processes. Among them, strategy, process, and capability constitute the three sides of the performance prism. The satisfaction of stakeholders and the contributions of stakeholders constitute the two bottom sides of the performance prism. The five indicators permeate and interact with each other to form a complete system. Strategy formulation, implementation process, and ability are interdependent, mutually conditional, and dynamic cyclic. Strategy formulation is the direction, and the implementation process is the means, giving full play to the four most basic functions of performance evaluation: judgment, prediction, selection, and orientation. The performance prism starts on stakeholders’ needs; determines the strategic planning, process setting, and capacity building of the project; and finally returns to the stakeholders’ contributions to form a complete closed loop. This method fits the PPP project performance evaluation concept proposed in this article, which is based on process management and stakeholder orientation [16].

Based on the perspective of the performance prism, from the definition of project success and project operation performance goals, this research constructs an operation performance evaluation indicator system of urban rail transit PPP projects that combines the viewpoint of continuous improvement, as shown in Table 2. Compared with traditional performance evaluation methods, the proposed system (1) includes process and efficiency indicators, which are more comprehensive, (2) introduces public participation to fill in the lack of stakeholders, and (3) emphasize project governance and continuous improvement to ensure the timeliness and availability of evaluation results.
4. The Method
4.1. Determination of Evaluation Indicator Weight
Multicriteria Decision Making (MCDM) refers to evaluating multiple candidate options based on a series of different criteria to find the optimal solution [32]. Weighting the evaluation criteria is the most important segment. After decades of research, several relatively thorough and systematic methods are developed to conduct the weight vector of evaluation criteria. At present, there are mainly two categories of conventional weighting methods: (1) subjective weighting methods based on subjective opinions from experts, such as the Delphi method, analytic hierarchy process (AHP), fuzzy comprehensive evaluation, and gray clustering, and (2) objective weighting methods based on the characteristics of the evaluation data, such as entropy weight method, OWA operator, and principal component analysis/factor analysis [33]. However, there are numerous indicators and large-scale evaluation data in the performance evaluation of urban rail transit PPP projects. Also, stakeholders are classified in prior, rather than identified using the evaluation information. It is challenging for methods mentioned above to process large-scale evaluation data, and most of the objective weighting methods based on data feature classification can hardly perform satisfyingly. In response to these two problems, Rezaei proposed a new multicriteria decision-making method—the best worst method (BWM) [34]. The decision maker finds the best and most important criteria and the worst and least important criteria in the criteria group according to the ultimate target. Then, pairwise comparisons are conducted between the best worst criteria and other criteria. The maximum nonlinear programming problem is constructed according to the judgment vector. The weight of each criteria corresponding to the final goal can be obtained by solving the problem. Statistical results show that BWM’s performance is significantly better than AHP in terms of evaluation difficulty, consistency ratio, minimum violation, maximum deviation, etc. [35]. Comparing with the existing MCDM method, BWM has several advantages: (1) less comparative data are required. Other matrix-based MCDM methods (such as AHP) require n(n − 1)/2 comparisons. BWM is a vector-based method requiring only 2n − 3 comparisons and thus reduces calculations; (2) the comparison vector is more consistent, which means the result is more accurate [36]. Therefore, this research chooses BWM as the weighting method for evaluating urban rail PPP projects’ performance evaluation indicators.
Based on the idea of BWM, the steps to solve for weight vectors of performance evaluation indicators of the urban rail transit PPP projects are as follows: Step 1: invite evaluation experts in multiple fields of urban rail transit, including authoritative scholars, consulting experts, senior practitioners, deputies of community people’s congress, media, and Nongovernmental Organization (NGO) representatives, with sufficient professional knowledge and scientific literacy required for evaluation. Step 2: determine the project performance evaluation indicator system based on the stage of the project. Step 3: ask the experts to select the most and least important indicators. Specific comparison values are not required in this step. Step 4: determine the priority of the most important indicators relative to other indicators, obtain the relative importance from language scale transformation and according to which obtain the comparison vector . Besides, represents the priority of the most important indicator to another indicator , while . Step 5: determine the inferiority degree of the least important indicators relative to other indicators and get the comparison vector . represents the priority of other indicator to the least important indicator , while . Step 6: calculate the weight vector. To determine the optimal weights of the indicators, the maximum absolute error for all indicators should be the minimized, and thus, a nonlinear programming function is constructed as follows:
For the convenience of solving, the nonlinear programming problem can be transformed into the following form:
The weight vector and the minimum consistency error can be obtained by solving this nonlinear programming problem. It can be proved that the problem has a unique solution [36]. A smaller indicates a smaller consistency error of the comparison information provided by the evaluator and results in stronger consistency. Therefore, can be used to construct a consistency coefficient CR of quality evaluation, that is, the ratio of the maximum error obtained when and take different values (Table 3).
4.2. Construction of Evaluation Model
4.2.1. Problem Description and Variable Definition
To establish the large-scale evaluation model for operation performance of urban trail transit PPP projects, this research first provides relevant variables and their definitions: : G represents the total set of evaluators, represents the th subset that participates the evaluation, . : Q represents the number of evaluators in each subset of total set G, represents the number of evaluator in the subset , . : C represents evaluation indicator system, represents the th indicator, . : W represents the weight of evaluation indicator, represents the th indicator weight, . : S represents the set of evaluation scale, represents the th evaluation level, . Generally, set S is positively ordered, which means, when , . Set S can include both language scale, such as {very bad, bad, fair, good, very good}, and numerical evaluation value, such as {1, 2, 3, 4, 5}. : represents the evaluation vector of subset within indicator , represents the evaluation information of th evaluator in subset , , , . For research convenience, if S is language scale, then convert it into the corresponding numerical evaluation value. For example, {very bad, bad, fair, good, very good} can be transferred as {1, 2, 3, 4, 5}.
4.2.2. Steps of Evaluation
(1) Determine the Frequency Distribution of Project Performance Evaluation Information for Each Subset under a Certain Indicator. According to the evaluation information on evaluation indicator provided by evaluator in subset , which is vector , , , the count matrix can be obtained. Besides, is a 0-1 variable in every row vector . If , . Otherwise, , , .
Define as the number of evaluators using language scale when evaluating the project in subset under indicator . Then
Define as the frequency of evaluators using language scale when evaluating the project in subset under indicator . Then
Obviously, , .
Therefore, the frequency distribution of evaluation in subset under indicator is
To facilitate calculation, define as the set of evaluation value transferred from evaluation scale . If S is a numeric set, then , . If S is language scale set, then represents the transferred evaluation value corresponding to scale , . Based on this transformation, the frequency distribution of evaluation information based on the language scale can be transformed into a frequency distribution based on the evaluation value.
For the convenience of research, it is expressed in vector form as .
In the context where a large number of evaluators or extreme opinions appear in the subset, a few extreme opinions can be identified as noise and eliminated. In this way, the consistency of the evaluation results can be increased and the authenticity of evaluation information can be guaranteed. Define and as the mean value and variance of the evaluation information’s frequency distribution , and
According to the Rajda criterion (3σ rule) for judging data abnormalities, when the number of measurements is sufficiently large (the number of samples > 10), if the sample data distribution is normal or approximately normal, data outside the interval should be regarded as gross errors and eliminated, in which standard error is calculated through data processing. For the stakeholder’s subset with similar background and interest requirements, the provided evaluation information can be regarded as following the normal distribution. Thus, denoise the evaluation information and gain
After eliminating abnormal data, frequency needs to be amended
Then, the demanded frequency distribution can be obtained. It should be mentioned that the denoise process is not compulsory. If the number of evaluators is relatively small or the evaluation organizer considered that the valuable information would lose during the denoise process, the denoise procedure should not be performed.
(2) Determine Each Subset’s Evaluation Weight for the Project Evaluation under a Particular Evaluation Indicator. When aggregating evaluation information, it is essential to assign appropriate weights to each subset. Two factors should be considered when determining each subset’s weight for project evaluation: Firstly, the evaluators are stakeholders on different behalf with different knowledge, experience, background, and interest requirements. The importance and discourse power of each subset in the process should be considered according to the actual situation to obtain subjective weights of the subsets; Secondly, with a large number of participants in the evaluation process, the consistency of the evaluation information given by the evaluators in each subset can reflect its quality [37]. Higher consistency of the evaluation information means that the evaluators from the same subset have highly consistent knowledge and interest needs. Consequently, the evaluation information they provided is more important, and higher weight should be assigned to obtain the objective weight of the subset. Therefore, the comprehensive consideration of the subset evaluation weights within subjective and objective factors can ensure the scientificity and reliability of the weighting process.
Define as the subjective weight of each subset under indicator , represents the subjective weight value of subset , and then and , .
The objective weight of subset is obtained based on the evaluation information’s consistency given by subset. Define as the theoretical maximum variance in all frequency distribution. Then
Define the consistency coefficient as the consistence of the evaluation information given by subset under indicator [37]. Then
According to (5) and (9), (a) ; (b) if and only if and , ; (c) if and only if . Generally, the larger consistence coefficient , the better information quality, the higher objective weight. Define as the objective weight of each subset under indicator . Then
Obviously, and , .
Define as the evaluation weight of subset under indicator . Then
In this equation, and are weight combination coefficients, . Evidently, and , , . Values of and are obtained with the comprehensive consideration of the effects of subjective and objective factors on the importance and discourse power of subset.
(3) According to the Frequency Distribution of the Evaluation Information and the Subset’s Evaluation Weight, Evaluation Information and Fairness Are Considered Comprehensively, and the Evaluation Results Are Aggregated under a Particular Evaluation Indicator. When aggregating evaluation information, besides the evaluation information, differences in opinions between different subsets should also be noted to ensure the fairness of the evaluation. When different subsets under a particular evaluation indicator have different perceptions of project performance, it is indicated that the project performance under this indicator does not reach consensus among all stakeholders. Hence, it is difficult to evaluate this project as successful. For urban rail transit PPP projects, it is particularly important for project performance to ensure fairness and good partnership in projects. High-performance urban rail transit PPP projects not only should perform outstandingly in performance indicators but also be highly regarded by different stakeholders. By defining the fairness indicator of the two subsets’ opinions, the difference of their opinions can be measured. After integrating the fairness evaluation into the project performance evaluation results, the evaluation results’ authenticity and completeness can be guaranteed.
According to evaluation information, evaluation score under the evolution indicator is
Then, , , .
Cumulative frequency distribution is constructed to balance the difference between frequency distribution
Define as the diversity factor between subset and subset , , and presents the evaluation information’s cumulative distribution of two groups of evaluators under indicator . Then
Define the difference between subset and subset as the theoretical maximum difference of pairwise comparison in cumulative distribution. and represent the evaluation information’s cumulative distribution from two group of evaluators under indicator . Then
According to the evaluation information, the fairness coefficient between subset and subset is defined under evaluation indicator . Then
According to (13) and (14), (a) ; (b) if and only if ; (c) if and only if ; (d) . Generally, with larger fairness coefficient , the disagreement between the two subsets will be smaller and the evaluation information will be fairer.
By calculating the fairness coefficient between any two subsets in G under evaluation indicator , the fairness coefficient matrix between each subset under the evaluation indicator can be obtained as
Based on this, define the fairness score of each subset under evaluation indicator , the influence of self-contrast is eliminated, and then
And , , . The larger , the more consistency of opinions between subsets.
Considering the evaluation score and fairness score , the evaluation result under evaluation indicator is aggregated.
In this equation, a and b are weight combination coefficient, , . Obviously, . The values of a and b are defined based on the influence of the evaluation score and fairness score of evaluators on the evaluation results. It should be noticed that there is a situation in which the evaluation result may be good if the subset has a consistent negative evaluation of the project. Hence, the value of b should not be overly high to avoid this situation.
(4) Based on the Indicator Weight and the Evaluation Results under Each Indicator, the Final Comprehensive Evaluation Conclusion Is Obtained. According to the indicator weight vector W gained by the best worst method, the final evaluation result U is aggregated based on the evaluation results under each evaluation indicator
Then, . The obtained evaluation scores can be classified according to actual needs. Then, the performance under the indicator and the total performance can be evaluated.
5. Case Study
5.1. Overview of Beijing Metro Line 4 Project
Beijing Metro Line 4 is one of the main lines operating north-south in Beijing’s urban rail transit network, with a total length of 28.2 kilometers. It starts from Anheqiao Bei(N) Station in the north, passing through major urban areas such as Haidian District, Xicheng District, Fengtai District, to the terminal station of Gongyi Xiqiao Station in the south. With 24 stations in total, it connects essential facilities and transportation hubs such as well-known universities, Zhongguancun Science and Technology Park, Xidan Commercial District, places of interest, and Beijing South Railway Station. The overall planned investment for the Beijing Metro Line 4 project is about 15 billion yuan. The project officially started in August 2004 with a construction period of 5 years. The project completed and opened for trial operation in September 2009, and until the end of 2019, it has been in operation for more than ten years.
Beijing Metro Line 4 is the first PPP project in the field of urban rail transit in China. Beijing Infrastructure Investment Co., Ltd lead the project and introduced foreign-funded MTR Corporation Limited and state-owned Beijing Capital Group as joint shareholders. The company achieved multiple project management agreements with Beijing government and Beijing Rail Transit Construction Management Co. Ltd, franchise, in aspects of project investment, financing, construction, operation, maintenance, and other related matters. Regarding specific arrangements for the PPP model, the three companies set up Beijing MTR Corporation Limited (BJMTR) as a joint venture at a ratio of 2 : 49 : 49.
The project construction investment is divided into two parts: main body civil engineering and vehicle and ancillary equipment. The former is handled by a subsidiary wholly owned by Beijing Urban Investment Platform. The latter is in charge of BJMTR. After completing and accepting the project, BJMTR obtained the right to use Part A through the lease agreement. BJMTR is responsible for the operation, maintenance, and renewal of all the project assets through the franchise agreement. The franchise period is 30 years. Afterward, the AB parts will be transferred freely to the municipal government and the urban investment company.
5.2. Performance Evaluation of Beijing Metro Line 4 Project in Operation and Maintenance Period
5.2.1. The Determination of Evaluation Indicator Weight
This research uses the best worst method to determine the weight of the evaluation indicator. To ensure the scientificity and fairness of the evaluation indicator value assignment, five experienced experts specialized in related fields, such as urban rail transit investment, construction, and operation, were invited. They have no direct interest in the project. These five experts formed an evaluation team for value assignment (Table 4).
Taking the relative importance vector into formula (2), a nonlinear programming function is constructed, and LINDO software is used to get the solution. Weight vector and minimum consistency error are obtained. As = 9, = 5.23, take them into (3) to calculate consistency coefficient CR (Table 5).
Then, W = (0.0476,0.1385,0.0733,0.1813,0.1125,0.0537,0.0756,0.0949,0.0362,0.0286,0.0302,0.0271,0.0339,0.0275,0.0391). Through consistency coefficient, the average error is 0.81%, the maximum error does not exceed 1%, and the results have very good consistency.
5.2.2. The Collection and Processing of Evaluation Information
The total set of evaluators G for urban rail PPP projects comprises evaluators from three subsets: public sector G1, social capital G2, and social public G3. The evaluators are experts and representatives invited from the three sectors. To closely integrate project performance evaluation with project management and better apply the function of project performance evaluation, the evaluators include direct participants in the project from the public sector and social capital. This study invited 20 experts from three parties to participate in the operation and maintenance performance evaluation of the Beijing Metro Line 4 project. The evaluation was mainly based on the performance of the project in 2018-2019. According to the nature of the original information, the qualitative indicators should be divided into different performance evaluation levels under the consultation of scholars, experts, and senior practitioners. For research convenience, this study set the project performance at five levels, namely, {very poor, poor, qualified, good, very good}. For each level, a specific performance description was given to facilitate the judgment of the evaluator. The original qualitative data were obtained using several methods, such as case study, field investigation, symposium, and questionnaire survey. For quantitative indicators, levels are also determined according to corresponding ratios and levels with professional knowledge. The original quantitative data were obtained through the project company’s database, daily reports, statements, field measurements, questionnaire surveys, and the like. After transforming the original qualitative information and quantitative data into an evaluation language scale based on measurement and comparison, it was further converted into corresponding evaluation values, such as {1,2,3,4,5}.
(1) Determine the Frequency Distribution of Project Performance Evaluation Information from Each Subset under a Particular Indicator. For indicator KPI1, “Introduce advanced equipment, technology, and management experience,” according to formula (4) and (5), the evaluation information and frequency distribution of the three parties are shown in Table 6.
(2) Determine Each Subset’s Evaluation Weight for the Project Evaluation under a Particular Evaluation Indicator. Combining the importance and discourse power of each stakeholder in operation and maintenance stages of urban rail PPP projects, experts gave the subjective weight of each subset = (0.3,0.35,0.35). According to (9),
According to (10), objective weight of each subset is = (0.3510,0.3559,0.3131). Considering the subjective and objective factors, experts gave the weight combination coefficient = 0.5, = 0.5. According to (11), the evaluation weight of subset under indicator is
(3) According to the Frequency Distribution of the Evaluation Information and the Subset’s Evaluation Weight, Evaluation Information and Fairness Were Considered Comprehensively, and the Evaluation Results Were Aggregated under a Particular Evaluation Indicator. From (17), according to the evaluation information, the evaluation score under the evaluation indicator is
Cumulative frequency distributions , , and are constructed.
From (13), the diversity factor between the pairs of three subsets. According to (14), gain the fairness coefficient matrix of each subset under evaluation indicator
From (24), project performance under evaluation indicator obtained the fairness score f1 = 0.9583, which indicates that the fairness of evaluation is very good.
Based on the effects of evaluation information and fairness on evaluation results, experts assigned value for weight combination coefficient as a = 0.8 and b = 0.2. From (25), the comprehensive score of evaluation indicator “Introduce advanced equipment, technology and management experience” for Beijing Metro Line 4 in operation and maintenance periods can be calculated as
Similarly, the comprehensive scores of other indicators for Beijing Metro Line 4 in operation and maintenance periods are
The indicator weight vector W can be gained using the best worst method from (26), the final evaluation results U was obtained based on the aggregation of evaluation under each evaluation indicator
The project performance were divided into 5 categories based on the scores, which are {very bad, bad, qualified, good, very good}, corresponding to the scores ranges {[0-0.6), [0.6-0.7), [0.7-0.8), [0.8-0.9),[0.9-1]}. The project’s total evaluation results were obtained, including eight “good” indicators and seven “very good” indicators.
5.3. Results Analysis and Suggestions Based on Operational Performance Evaluation of Beijing Metro Line 4 Project
Through overall model calculation, the Beijing Metro Line 4 project’s entire performance in operation and maintenance periods was evaluated as good. Most existing PPP projects rely on financial subsidies as important project income. According to “Administrative Measures for Infrastructure and Public Utilities Franchising” and “Government and Social Capital Cooperation Project Performance Management Operational Guidelines,” the project’s performance evaluation is closely linked to financial subsidies. The findings of this study provide a scientific, reasonable, and accurate quantitative foundation for PPP projects financing with government payments and feasibility gap subsidies to better implement “pay for performance” incentive mechanism.
Through the evaluation results of each indicator, the project gains high evaluation results in indicators, including “Project’s profitability, solvency, and financial sustainability.” “The user’s appropriate rate, price adjustment, and compensation mechanism ensured the sustainability of the project and the reasonable profit of the social capital” and “Social, environmental, and economic externalities.” These results mean that the project has a good overall financial situation and strong sustainability. The project’s pricing and compensation mechanism is relatively complete and balances all parties’ interests. The project is expected to have a positive impact on the social economy. However, the project performs normally in indicators, including “End-user satisfaction,” “Operational management and service quality,” and “Effective public participation.” Line 4 has always had one of the top three passenger volumes among urban rail transit lines in Beijing and one of the top ten in China. With further increases in passenger volume in recent years, excessive congestion and full load have become critical. As a consequence, current transportation service is not adequate for the requirements of passengers and unexpected breakdowns and delays also lead to a decline in passenger satisfaction. It shows that the project needs further improvement in meeting users’ requirements, improving operation efficiency and service quality, and helping the public participate in the performance evaluation. The evaluation results of indicators can be better integrated with the actual project management to enhance the performance improvement and provide benchmark data and experience. The evaluation results can also facilitate maintaining transparency to the public and stakeholders and improving service quality, finding and solving problems, and support project governance continuously from the process management perspective.
For existing problems, two suggestions are proposed: (1) Attach importance to public participation in project performance. It is necessary to collect users’ opinions and conduct satisfaction surveys during the project operation and maintenance periods, to reveal the problems affecting the quality of transportation services and the satisfaction of end-users. After that, the problems should be solved timely, and practical feedback should be provided. It is essential to encourage public participation in the project performance evaluation and to enhance the acceptance toward the evaluation results, from which a virtuous circle of project performance between the project and the user can be formed. (2) Improve operational capabilities and strengthen equipment maintenance. In view of the problem of insufficient transportation volume and overcrowding during the morning and evening peak hours, the carrying capacity during the peak period should be increased through institutional arrangements such as increasing the number of departures, shortening the departure interval, introducing more efficient vehicle equipment, and management systems and other technical means. In usual, it is necessary to strengthen the overhaul of project equipment and signal systems and assign special personnel responsible for particular tasks to avoid delays caused by vehicle and equipment failures.
6. Conclusion
In the context of the standardized and refined development of the PPP model, this research starts focus on the operation performance evaluation of PPP projects and improve the operation performance evaluation–related theory. The following studies have been conducted: (1) Research on the evaluation system of urban rail transit PPP projects. According to the economic and technical characteristics of urban rail transit PPP projects and PPP-related research, a logical model and an indicator system of project performance evaluation based on the performance prism are constructed. (2) Construct a project performance evaluation model based on the best worst method and large-scale group evaluation technology and illustrate with case studies. Comparing the advantages and disadvantages of existing project performance evaluation methods and analyzing the application situation, a project performance evaluation model is constructed using BWM and large-scale group evaluation technology and is verified using a practical example.
Through the research and case study, the main conclusions of this research are as follows. (1) The multistakeholder-oriented urban rail transit PPP project performance logic model and the three-party evaluation indicator system can scientifically and comprehensively cover the project performance during the operation and maintenance period to ensure project success. (2) The urban rail transit PPP project performance evaluation model based on the best worst method and large-scale group evaluation technology can meet the performance evaluation circumstance’s requirements and ensure scientificity and authenticity of evaluation results.
Although several works have been conducted on the performance evaluation of urban rail PPP projects in this study, some limitations still exist and need to be further improved. (1) The interrelationship between multidimensional indicators was not considered when constructing urban rail PPP projects’ operation performance evaluation indicators. There may be duplication, correlation, or causation between indicators. Further empirical research is needed to analyze the mutual influence mechanism between indicators. (2) Multiple indicators constructed in this study are comprehensive and general indicators, such as satisfaction and external impact. These indicators need to be further refined into measurable constructs for the evaluators to measure. (3) Some of the data in the quantitative evaluation model proposed by this research were obtained through questionnaire surveys. The obtainment and calculation of these data will inevitably lead to subjective errors. In future research, the relevant evaluation index system needs to be further improved. Besides, there are two directions for further researches. (1) Empirical research on the relationship between indicators at various stages of the project based on process management. The research on the key success factors of PPP projects has been relatively mature. However, the relationship between the indicators in each stage of the project still needs further investigation. How to successfully achieve project success through project management is a question that project managers need to answer. (2) The encouragement and improvement of public participation in PPP projects evaluation. For current infrastructure PPP projects, the public participation system for the performance evaluation is still blank in China, which weakens the credibility of project performance evaluation to some extent. How to design a particular system to overcome the dilemma mentioned above and realize effective public participation still needs further research. (3) The construction of BIM-based project performance information management system. An entire project performance evaluation system relies on comprehensive and detailed measurement and accumulation of daily data throughout the project cycle. As a comprehensive platform and visualization tool for project information management, BIM can promote the organic combination of project management and project performance evaluation. It can help project managers and performance appraisers avoid subjective errors and improve the performance evaluation.
Data Availability
The numerical application data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by the Youth Program of Humanities and Social Sciences Foundation of Ministry of Education of China (21YJCZH169).