Abstract

Environmental pollution has become an important obstacle on the path of ecological civilization construction, and it is urgent to control environmental pollution. By establishing an evolutionary game model, this thesis focuses on analyzing how paper-making enterprises choose their own emission reduction strategies under the reward and punishment mechanism. It further analyzes how social welfare changes under the reward and punishment mechanism, and finally through simulation research, this thesis analyzes the evolutionary paths of paper-making enterprises’ pollution emission strategies under the reward and punishment mechanism. The results of the reward and punishment mechanism are as follows: under the static reward and punishment mechanism, the game system will repeatedly oscillate around a point. There is no stable equilibrium point at this time. However, under the dynamic reward and punishment mechanism, the game system will tend to a stable equilibrium point. The results of social welfare analysis show that high-intensity rewards will reduce the amount of pollution discharged by paper-making enterprises, thereby maximizing social welfare. On the contrary, when paper-making enterprises discharge a large amount of pollution, they will be subject to high-intensity penalties. When facing high-intensity punishments, paper-making enterprises will tend to not to discharge. So social welfare is also maximized. The simulation research results show that reasonable punishment strategies are more effective than reward ones. Based on this, the author proposes countermeasures, such as establishing a reasonable reward and punishment mechanism, reasonably determining the reward and punishment intensity for polluting enterprises. The emission reduction strategies of paper-making enterprises will be affected by the government’s reward and punishment mechanism. A deep study of its internal mechanism is not only of great significance for pollution control but also of great significance for the development of a green economy.

1. Introduction

Pollutant emissions from paper-making enterprises have always been a major source of environmental pollution due to their volatility and difficulty in eradicating them. The pollution caused by pollutant emissions from paper-making enterprises still occupies an important position in the ecological environmental pollution. However, our country has never given up its efforts in the governance process. According to the “China Paper Yearbook,” the industry wastewater discharge from paper-making enterprises has shown a downward trend from 2011 to 2015. The proportion of paper industry wastewater discharge in the country’s total wastewater discharge decreased from 5.8% in 2011 to 3.22% in 2015, a drop of 2.58 percentage points. In addition, the discharge of wastewater from the paper-making enterprises has increased by -11.29% annually over the past five years. The reduction in pollutant emission on the one hand shows that China’s goal of eliminating outdated production capacity has been well achieved. Only the enterprises with development potential and development foundation can remain. On the other hand, we can also see the Chinese government’s determination and perseverance to control environmental pollution. It regulates or restricts polluting enterprises by increasing rewards or penalties for them. Although these measures have achieved certain results, the internal mechanism and logic of the paper-making enterprises still needs to be studied when they choose their own pollution emission strategies under the government’s reward and punishment mechanism.

On the basis of the interest conflict of the two parties in the game, it is necessary to fully consider the government’s reward and punishment mechanism and its implementation strength, when studying the government’s reward and punishment mechanism and the paper-making enterprises’ choices of pollution emission strategies. However, there are still huge challenges to fully understand human cooperation [1]. According to the above analysis, as China has increased its efforts in pollution control, China’s pollution has been well controlled. However, it is still necessary to further analyze that the polluting enterprises’ pollution emission strategies will be influenced by the reward and punishment mechanism. Although some scholars have analyzed the choices of the enterprises’ pollution emission strategies under the reward and punishment mechanism, they have ignored the changes in social welfare under the reward and punishment mechanism [24]. Generally speaking, rewards and punishments may have different impacts on polluting enterprises, but their fundamental purpose is to maximize social welfare. By establishing an evolutionary game model, this thesis analyzes how paper-making enterprises choose their own emission reduction strategies in the absence of a reward and punishment mechanism. It further analyzes how to maximize social welfare under the punishment mechanism. By analyzing the pollution discharge strategy of paper-making enterprises under the reward and punishment mechanism, it can provide a reference for the government to formulate appropriate environmental governance strategies.

The structure of this article is as follows. Section 2 reviews the relevant literature on corporate pollution control. It makes the mechanism analysis of how different types of reward and punishment mechanisms affect the pollution emission strategies of paper-making enterprises. Section 3 analyzes the problems between the government and paper-making enterprises’ choice of strategies under static reward and punishment mechanism. Section 4 analyzes the two parties’ strategic choices in the dynamic reward and punishment mechanism. Section 5 analyzes how social welfare changes under the reward and punishment mechanism. Section 6 uses MATLAB to analyze the evolution path of the government and paper-making enterprises under different forms of reward and punishment mechanism. Section 7 gives out research conclusions and recommendations as well as has a further discussion of the research conclusions.

2. Literature Review and Mechanism Analysis

2.1. Literature Review

The causes of the enterprises’ pollution emission are complicated, and the pollution ranges caused by it are also relatively large. To reduce the harm caused by enterprises’ pollution discharge, most scholars have done a lot of detailed research. Through the review of the existing researches, scholars’ researches on pollution reduction and governance can be roughly divided into the following categories.

One is that many authors studied the reduction and governance of the enterprises’ pollution from the perspective of environmental regulations. Most scholars believed that environmental regulations can effectively reduce environmental pollution [57]. Some scholars also believed that environmental regulations have different impacts on enterprises with different pollution levels—the impacts on pollution-intensive enterprises was much greater than that on cleaning enterprises [8]. Environmental regulations not only reduced the export profits of pollution-intensive enterprises but also made its export duration shortened [911]. Moreover, different types of environmental regulations had different effects on enterprises’ environmental pollution. Charge-based environmental regulations had an inverted U-shaped relationship with environmental pollution. Investment-based environmental regulations had an U-shaped relationship with environmental pollution [12]. In addition, environmental regulations did not always work. In areas with a high concentration of polluting enterprises, the partiality of local governments would weaken the roles of environmental regulations, thereby increasing pollution [13, 14]. In order to enhance the effectiveness of environmental regulations, some scholars believed that increasing environmental tax rates and implementing mandatory environmental regulations could be an important means to enhance the effectiveness of environmental regulations [15].

The second is that there are also many authors who studied the cross-regional pollution of enterprises, and they paid more attention to cooperative governance between regions. When the two governments were attached to different political alliances, how to better coordinate the two to face an issue together was an arduous challenge [16]. It was an important measure for different governments to formulate public policies that coordinate the relationship between governments at multiple levels, interlevel, and cross-level departmental communication and collaboration [17]. From a horizontal perspective, there was still a peer spillover effect between governments. If a company had closer exchanges with neighboring companies, its choice of emission reduction strategies was more likely to be influenced by its peers [18]. This peer spillover effect would undoubtedly greatly promote the governance of China’s environmental pollution. From a vertical perspective, the supervision and punishment of higher-level governments were necessary measures to form a stable and coordinated governance model among heterogeneous local governments [19]. However, if local governments faced more and more stringent political tasks, they would be tired to cope, so as to make the most beneficial actions and choices for themselves. From the perspective of organizational form, although the internal logic of different organizational forms was different, both the authority logic and the logic of meaning could often promote different organizational arrangements [20]. In the cross-regional treatments of the air pollution, if there were no constraints, the interregions tended to be untreated. On the other side, if there were constraints, they tended to cooperate in governance [21].

The third is that some authors used reward and punishment mechanism to study the multiagent governance of the enterprises’ pollution, in terms of the impacts of the punishment on the two parties’ cooperation and pollution control work in the game. Increasing penalties and government subsidies could promote the evolution of the game to an equilibrium point [2224]. Under the static punishment mechanism, the strategies between the government and the enterprises as well as the evolution trajectory of both parties were uncertain. However, under the dynamic punishment mechanism, the evolution paths between the government and the enterprise tended to converge to a stable value [3]. Without the strong supervision by the higher-level government, it would be difficult for both parties to spontaneously cooperate to implement ecological compensation. Increasing the penalty for noncooperating parties would make the game move to an equilibrium point [25]. The evolutionary game could converge to the ideal state by the following methods, such as intensifying supervision while reducing the supervision cost of the higher-level government, strengthening penalties for local governments and enterprises for noncooperation and violations of pollution discharges, and improving the incentive mechanisms [26]. The reward has the same effect as punishment, in terms of the impact of rewards on the two parties’ cooperation and pollution control work in the game. The enterprises’ enthusiasm for cooperating in the pollution control would be promoted by appropriately increasing the enterprise’ multiplication factors, that is, increasing the benefits of environmental improvement after pollution control [27]. In addition, increasing the company’s expectations for the benefits of participating in pollution control could also play the same role [28]. However, the degrees of the reward depended on the synergy between the groups. The two were in a positive correlation [29]. Increasing investment in special transfer payments for environmental protection could encourage companies to reduce pollutant emissions. But merely increasing the punishment for local governments’ negligence of duty could not force them to perform their duties [30].

In addition, the emotions of the two parties had a more direct impact on the cooperation and pollution control between the two parties. Optimistic emotions would push the cooperation between the two parties to a stable point, while pessimistic emotions would lead to the outbreak of conflicts between the border walls [31]. The government supervision and the public participation had a positive effect on environmental governance. Improving the loss of the government reputation and increasing public participation could effectively improve enterprises’ pollution control behaviors [32]. Because the efficiency of a company was inversely proportional to its innovation investment [33], the more efficient pollution enterprises may emit more pollutants.

It is undeniable that the abovementioned literature has made a great contribution to the research of the polluting enterprises’ emission reduction strategies. It also laid a solid theoretical foundation for the writing of this thesis. But there are still some limitations. One is that the government’s reward and punishment mechanism is not fully considered. The other is that changes in social welfare under the reward and punishment mechanism are rarely involved. Based on the above discussion, drawing on the research experience of the predecessors, this thesis establishes an evolutionary game model with no reward punishment mechanism as well as one with a reward and punishment mechanism. It further analyzes the changes in social welfare under the punishment mechanism. Finally, it uses simulation research to study the emission reduction strategies of paper-making enterprises under the reward and punishment mechanism.

Compared with the existing researches, the main marginal contributions of this research are as follows:(1)This study not only considers the influence of the government’s static reward and punishment mechanism on the choices of pollution emission strategies for paper-making enterprises but also further analyzes the influence of the government’s dynamic reward and punishment mechanism on the choice of pollution emission strategies for paper-making enterprises. On this basis, it also analyzes the changes in social welfare under the reward and punishment mechanism.(2)This study also uses MATLAB to numerically simulate the evolution path of the government’s reward and punishment mechanism and the choice of the paper-making enterprises’ pollution emission strategies. It further analyzes the influence of the different degrees of reward and punishment mechanism on the paper-making enterprise’s pollution emission strategies.(3)Through evolutionary game and numerical simulation analysis, the key factors that influence the strategy choice of the game player are obtained. The dynamic reward and punishment mechanism is more effective than the static one. Increasing the rewards and punishments for paper-making enterprises will make them more inclined to choose nondischarge strategies. Increasing the rewards for not discharging pollutants will not necessarily have a good effect.

2.2. Mechanism Analysis

The emission reduction strategies of paper-making enterprises will be affected not only by their emission reduction costs and their own benefits but also by the government’s incentives and penalties. However, the government’s rewards and punishments for paper-making enterprises will eventually translate into costs and benefits for the enterprises. We need to clarify how the government’s rewards and punishments for paper-making enterprises affect their choices of emission reduction strategies. The specific content is shown in Figure 1.

It can be seen from Figure 1 that in the game process between the government’s environmental supervision department and the paper-making enterprises, the starting point and end point of the two parties’ choices of strategies are always that their own interests exceed their own costs. Under the static reward and punishment mechanism, although the rewards and punishments have the highest standard at this time, it cannot have any substantial impact on the paper-making enterprises. Therefore, the paper-making enterprises are more inclined to choose pollution discharge strategies. Under the dynamic reward and punishment mechanism the rewards and punishments implemented by the government environmental supervision department are in direct proportion to the probability that the paper-making enterprise chooses to discharge pollution. That is to say, the greater the probability that the paper-making enterprise discharges pollution, the heavier the punishment will be. The greater the probability that the paper-making enterprises will not discharge pollution, the more rewards they will receive. Moreover, there is no upper limit for this kind of rewards and punishments, which will force paper-making enterprises to choose nondischarge strategies. Rewarding the compliance party and penalizing the noncompliant party is not the real purpose. The ultimate goal is to achieve environmental pollution control through rewards and punishments.

3. Evolutionary Game Model of Static Reward and Punishment Mechanism

3.1. Basic Assumptions
Hypothesis 1: assume that there are two major players in the paper-making enterprise pollution emission game, one is the paper-making enterprise (E) and the other is the government environmental supervision department (G). Because the two parties are bounded rationally in the game process, that is, both parties need to continuously adjust their own strategic behaviors according to the continuously enriching information and the other party’s strategies in the interactive game process and finally realize their own optimal strategies.Hypothesis 2: when the paper-making enterprise (E) and the government environmental supervision department (G) are in the game, suppose the strategy space of the paper-making enterprise (E) is (no pollution, pollution), and the probability of choosing the nondischarge and pollution-discharge strategies are, respectively, y And 1 − y. Suppose the strategic space of the government environmental supervision department (G) is (supervised, nonsupervised), and the probability that the government environmental supervision department (G) chooses supervision and nonsupervision strategies are x and 1 − x, respectively (Both x and y are probabilities and both are functions of time t.).Hypothesis 3: when the paper-making enterprises (E) choose a nonemission strategy, a certain cost will be incurred. If it is , the government environmental supervision department (G) will obtain additional benefits at this time. If the paper-making enterprises (E) choose to discharge pollution, and its cost is , and it will obtain additional benefits . At this time, the government environmental supervision department (G) will obtain the income, and it is supposed to be . At the same time, when the government environmental supervision department (G) chooses the supervision strategy, a certain cost will be incurred. Suppose it is . If the paper-making enterprises (E) choose the nondischarge strategies, the government environmental supervision department (G) will reward them. The reward is . On the contrary, if the paper-making enterprises (E) choose the pollution strategies, the government environmental supervision department (G) will impose penalties on them, assuming the penalties are . Both parties will get basic benefits regardless of whether they play a game or not, and the basic benefits of paper-making enterprises (E) are and the benefits of the government environmental supervision departments (G) are .

According to the above assumptions, the game payment matrix of paper-making enterprises (E) and government environmental supervision departments (G) can be obtained. The specific content is shown in Table 1.

3.2. Model Establishment and Solution

According to the game payment matrix of the paper-making enterprise (E) and the government environmental supervision department (G) in Table 1, this thesis assumes that the expected benefits of the paper-making enterprise (E) choosing no pollution are and the benefits of pollution discharge strategies are , respectively. When the paper-making enterprise (E) chooses no pollution strategies, the expected benefits are as follows:

When the paper-making enterprise (E) chooses the pollution emission strategies, its expected benefits are as follows:

At this time, the average expected income of the paper-making enterprise (E) choosing no pollution and pollution discharge strategies is , and the specific expansion formula is as follows:

In the same way, this thesis assumes that the expected benefits of the government environmental supervision department (G) choosing supervision are and nonsupervision strategies are , respectively. When the government environmental supervision department (G) chooses the supervision strategies, its expected return is as follows:

When the government environmental supervision department (G) chooses the nonsupervising strategies, its expected benefits are as follows:

At this time, the government environmental supervision department (G) can get the average expected return by choosing the supervision and nonsupervision strategies, . The specific expansion formula is as follows:

According to the principle of the Malthusian dynamic equation, the paper-making enterprises (E) and the government environmental supervision department (G) will copy the dynamic equation of the strategy interaction during the game. The specific form of the paper-making enterprise (E) is as follows:

The specific form of the replication dynamic equation of the government environmental supervision department (G) is as follows:

If and , solving the replication dynamic equation can get five equilibrium points, namely, (0, 0), (0, 1), (1, 0), (1, 1), and .

3.3. Stability Analysis

According to the method proposed by Friedman, the evolutionary stability strategy of the game system can be judged according to the local stability of the Jacobian matrix. The Jacobian matrix of the game system in this article is as follows:

According to the Jacobian matrix above, the determinant () and trace () can be obtained. The specific expansion is as follows:

After solving the above model, the value of each parameter in the partial equilibrium of the system can be obtained. The specific situation is shown in Table 2.

Because this part analyzes the game situation between the government environmental supervision department and the paper-making enterprises under the static reward and punishment mechanism, the rewards and punishments mentioned above are already the highest level of rewards and punishments. If and , there are five equilibrium points, namely, (0, 0), (0, 1), (1, 0), (1, 1), and (, ). Table 3 analyzes the local stability of the five equilibrium points. The results are detailed in Table 3.

It can be seen from Table 3 that if the income from illegal discharge of paper-making enterprises (E) is not significantly less than the sum of the rewards and punishments received, the paper-making enterprises will repeatedly jump between pollution discharge and nondischarge strategies. Under this circumstance, the paper-making enterprises are neither willing to take risks of choosing illegal emissioeens, nor are they reluctant to choose nonemissions strategies. At this time, the benefits of the government’s environmental supervision department are not more than the cost of supervision. So the government’s environmental supervision department is also in a wait-and-see state. Government environmental supervision departments are neither willing to pay more supervision costs, nor are they willing to be criticized for nonsupervision. It can be seen that there is no evolutionary equilibrium point under the static reward and punishment mechanism. Therefore, the situation in Figure 2 will appear.

4. Evolutionary Game Model of Dynamic Reward and Punishment Mechanism

4.1. Model Assumptions and Solutions

In this section, it is assumed that the government environmental supervision department’s rewards for nondischarge of paper-making enterprises is dynamically changing. The reward is proportional to the probability of paper-making enterprises’ not discharging pollution, turning the original fixed reward into a linear function . Similarly, assuming the penalties imposed by the government environmental supervision department on the paper-making enterprises’ pollution mission are also in a proportional relationship, which is expressed as a linear function . The rest of the assumptions are the same as those under the static reward and punishment mechanism, so it is not repeated here.

Replace and in equations (7) and (8) to and , and the specific expressions are as follows:

Similarly, by solving the abovementioned replication dynamic equation, five equilibrium points can be obtained: (0, 0), (0, 1), (1, 0), (1, 1), and (, ).

4.2. Stability Analysis

According to the method proposed by Friedman, the Jacobian matrix of the above game system can be obtained. The specific content is as follows:

According to the Jacobian matrix above, the determinant () and trace () can be obtained. The specific expansion is detailed below:

After solving the above model, the value of each parameter in the partial equilibrium of the system can be obtained. The specific situation is shown in Table 4.

According to the parameter value table in Table 4, the stability of the equilibrium point in several different situations is discussed.(1)When and , the equilibrium point of the game system can be judged according to the determinant of the Jacobian matrix at the equilibrium point and the sign of the trace. The specific content is shown in Table 5.It can be seen from Table 5 that if the income from illegal discharge of the paper-making enterprises (E) is significantly greater than the sum of the rewards and punishments received, the paper-making enterprises’ willingness to cooperate is very low. In this case, paper-making enterprises are more willing to take risks and choose illegal emissions. Therefore, paper-making enterprises will not choose cooperative strategies in this case. However, at this time, the government’s environmental supervision department will get a positive benefit because the penalty benefit it gets is greater than its supervision cost. In the end, the two parties will finally choose the strategy (supervision, sewage discharge). The phase diagram of the game process is shown in Figure 3. When and , the result of the game is the same as 1), and the evolution phase diagram is similar to Figure 3, so it is not detailed here.(2)When and , the equilibrium point of the game system can be judged according to the determinant of the Jacobian matrix at the equilibrium point and the sign of the trace. The specific content is shown in Table 6.It can be seen from Table 6 that if the fine income of the government environmental supervision department is less than its supervision cost, it will focus more on the choice of nonsupervision strategies. This is undoubtedly the time when the pollution is the most serious because the government environmental supervision department has lost the motivation to supervise. At this time, the income from paper-making enterprises’ pollutant emission is far greater than its fines and rewards. Paper-making enterprise may choose to discharge unscrupulously. The government is unwilling to see this situation. The phase diagram of the game process is shown in Figure 4.(3)When and , the equilibrium point of the game system can be judged according to the determinant of the Jacobian matrix at the equilibrium point and the sign of the trace. The specific content is shown in Table 7.It can be seen from Table 7 that if the income from illegal discharge of the paper-making enterprises (E) is less than the sum of fines and rewards, they are more inclined to choose the nondischarge strategies. At this time, the income of the government environmental supervision department is greater than the cost of supervision. Therefore, the government environment supervisory department also chooses the strategies of supervision. In this case, the two sides of the game can form a joint force of cooperation in pollution control. The phase diagram of the game process is shown in Figure 5. When and , the result of the game is the same as the result of case 3), and the phase diagram of the game is also similar, so it is not repeated here.(4)When and , the equilibrium point of the game system can be judged according to the determinant of the Jacobian matrix at the equilibrium point and the sign of the trace. The specific content is shown in Table 8.It can be seen from Table 8 that under the same conditions, the dynamic reward and punishment mechanism is more stable than the static reward and punishment mechanism. According to the above analysis, the evolutionary game system will oscillate around the central point under the static reward and punishment mechanism, and there is no balance point at this time. The possible reasons are as follows. One is that the incentives and penalties of the government’s environmental supervision department at this time have not reached the level that can effectively prevent the paper-making enterprises from discharging. Second, as far as the paper-making enterprises are concerned, there is no obvious difference between the cost of nondischarge and the benefits at this time. So its probability of choosing a nondischarge strategy will be very small. Third, under the static reward and punishment mechanism, the rewards and punishments received by paper-making enterprises are already of the highest standard. Even if paper-making enterprises choose not to discharge or discharge more pollutants, they will not suffer more rewards and punishments. However, under the dynamic reward and punishment mechanism, the government’s environmental supervision department and paper-making enterprises will tend to a stable point in the game. Because the rewards and punishments received by the paper-making enterprises at this time are proportional to the probability of whether or not they choose to discharge pollution. In other words, there is no upper limit for the rewards and punishments at this time. When paper-making enterprise face huge penalties for pollutant discharge, their motivation for pollutant discharge will be significantly reduced. The game phase diagram at this time is shown in Figure 6.It can be seen from Table 9 that when the illegal income obtained by the paper-making enterprises is less than the reward or punishment it receives, it is more inclined to choose the nondischarge strategies. At this time, the government environmental supervision department is more inclined to choose a nonsupervision strategy because the benefits it obtains are lower than the supervision cost it pays. In short, because both sides of the game are bounded rationality, the two sides are not independent on the market and interests. Therefore, both the government environmental supervision department and paper-making enterprises make strategic choices based on the size of their costs and benefits. The game phase diagram is shown in Figure 7.(5)When and , the equilibrium point of the game system can be judged according to the determinant of the Jacobian matrix at the equilibrium point and the sign of the trace. For details, see Table 9.

5. Analysis of Social Welfare under the Reward and Punishment Mechanism

The above content has analyzed the selection of pollutant emission reduction strategies for paper-making enterprises under the reward and punishment mechanism. This section continues to analyze how to maximize social welfare under the reward and punishment mechanism.

5.1. Basic Assumptions

Based on the above analysis, this thesis assumes that in the game process of environmental pollution control, there are two means, that is, reward and punishment. After the implementation of these two means, social welfare may have the same or opposite changes. This section mainly analyzes how social welfare will change under these two methods of reward and punishment. The specific analysis content is as follows:Assumption 1: during the game, paper-making enterprises will have different discharge states. According to the amount of discharge, it can be divided into large discharge and small discharge. According to whether the discharge is compliant or not, it can be divided into compliance discharge and noncompliance discharge. According to whether it discharges pollution, it can be divided into sewage and no sewage. The sewage can be divided into sewage is being discharged, sewage has been discharged, sewage is about to be discharged, and sewage has not been discharged. Paper-making enterprises may choose one of the pollutant discharge states during the game. However, when the results are not seen, the state of the paper-making enterprises is unknown. Therefore, the author introduces the disordered set A to indicate the state of discharge of the paper-making enterprise. The disordered set A can be expressed , which indicates the state of discharge of the paper-making enterprises, or it can be expressed as . Because the state of pollutant discharge of a paper-making enterprise is uncertain, it is necessary to assign a certain probability to each state. That is, the probability of a paper-making enterprise in any state of pollutant discharge is . Whether it will discharge pollution is still uncertain. The author gives a certain probability to this uncertainty, that is, the probability that a paper-making enterprise will discharge pollution under any kind of pollution discharge state is . The probability of pollution discharge will vary with the government environmental supervision department’s form and intensity of supervision. The author assumes that the discharge of paper-making enterprises is a continuous random variable. The loss to society and ecology caused by the discharge of pollution in any state is , and is the loss probability density function, which is used to measure the social welfare losses caused by paper-making enterprises’ emission.Assumption 2: the social welfare loss function is formulated as . is used to indicate the reward that the paper-making enterprises have not carried out emission discharge under any state. is used to indicate the areas that are not contaminated. indicates the total income and welfare that the area receives. Although rewards will not prevent paper-making enterprises from discharging pollutants, they can reduce pollution levels. Let the degree of reduction be . The paper-making enterprises’ pollution function is used to express the degree of pollution. According to the abovementioned assumption, the paper-making enterprises’ pollution is a function of total regional revenue, pollutant emissions, and rewards. It is expressed by the formula as . is used to represent the punishment that the paper-making enterprise receives for pollutant discharge in any state. Similarly, stands for the contaminated area. So means the damage to social welfare by the paper-making enterprises’ pollution. Although the punishment will not completely prevent the paper-making enterprises from discharging pollutants. It will also reduce the pollution level to a certain extent. Let the reduction degree be . Use the paper-making enterprises’ pollution function to express the degree of its pollution. According to the above assumption, the paper-making enterprises’ pollution is a function of the total regional damage, the amount of pollutants discharged, and the penalty. Also, it is expressed by the formula as .Assumption 3: if a paper-making enterprise chooses a nondischarge strategy, it will pay a certain cost. The author assumes that the cost will be . If a paper-making enterprise chooses a pollution discharge strategy, it will not pay. The government environmental supervision department will also incur a certain cost in the supervision process. Assume its cost is .

5.2. Establishment and Solution of Social Welfare Maximization Model
5.2.1. Social Welfare Maximization Model and Solution under the Reward Mechanism

Based on the above assumptions and then introduce the indefinite period. The so-called infinite period is to place the pollutant discharge of the paper-making enterprises within a longer historical period. Moreover, under the incentive mechanism, if social welfare is to be maximized, it will be restricted by the following factors, such as the previous award level, the degrees of rewards, and the level of pollution emitted by paper-making enterprise in the previous period. The optimization model is as follows:

In order to simplify the analysis, the author gives the initial value b = 1 of the upper limit of the discharge status of the paper-making enterprise. Which means that there is only one discharge state of the paper-making enterprises, and the probability of a paper-making enterprise in any discharge state is  = 1/3. The probability that a paper-making enterprise will discharge emission in any state of emission discharge is  = 1/2. Paper -making enterprises will reduce the pollution levels when they are rewarded, is 1/3 and . This shows that the social welfare damage caused by the paper-making enterprises’ pollution will be great. The pollution function of paper-making enterprises is and . This shows that in any state, if the paper-making enterprises choose not to discharge pollutants, they will pay a lot of costs. The government supervision costs will vary with the discharge behaviors of the paper-making enterprises. The cost will be . According to the above assumptions, equation (14) can be simplified, and the specific form is shown as follows:

According to the above formula, the Bellman equation can be obtained, and the equation is as follows:

According to the above assumption, we know and the specific form can be known as . Then, the specific form of is . So the Bellman equation Ctrip can be in the following form.

The first-order conditions for rewarding the Bellman equation can be solved. The relationship between , , and can be solved. The expression of the three relations is . From this, it can be seen that the pollutant emission of paper-making enterprises is inversely related to the reward intensity. Specifically, the smaller the pollutant emissions of paper-making enterprises, the greater the reward they will receive. Conversely, the greater the pollutant emissions of paper-making enterprises, the less the incentives they will receive. The greater the pollutant emissions of paper-making enterprises, the smaller the area not be polluted. So the smaller the social welfare of the uncontaminated area.

5.2.2. Social Welfare Maximization Model and Solution under the Punishment Mechanism

Based on the above assumptions and then introduce the intertemporal substitution. The so-called intertemporal substitution is that a penalty mechanism is not only effective in the current period but may also have a certain effect in the next period. Under the penalty mechanism, if the social welfare is to be maximized, it will also be restricted by the punishment of the previous period and the paper-making enterprises’ pollutant emissions levels in the previous period. The optimization model is as follows:

Similarly, the assumption under the penalty mechanism is roughly the same as the assumption under the reward mechanism. Given the initial value b = 1 of the upper limit of the discharge status of a paper-making enterprise and the probability of a given paper-making enterprise in any discharge status is  = 1/3, the probability of the paper-making enterprise to discharge under any kind of pollutant discharge state is  = 1/2. When they are punished, the degree of pollution reduction of paper-making enterprises is  = 1/3 and . So the pollution function of paper-making enterprises is and the cost of government supervision is . Based on the above assumptions, equation (19) can be simplified, and the specific form is as follows:

According to the above formula, the Bellman equation can be obtained, and the equation is as follows:

According to the above assumption, the specific form of can be known as . Then, the specific form of is . So the Bellman equation Ctrip can be in the following form:

The first-order condition of the penalty for the Bellman equation can be solved, and the relationship between , , and can be solved. The expression of the three relations is . From this, it can be seen that the pollutant emission of paper-making enterprises and the punishment are in a proportional relationship. Specifically, the greater the amount of paper-making enterprises’ pollutant emissions, the greater the penalty. On the contrary, the smaller the amount of paper-making enterprises’ pollutants, the smaller the penalty. The greater the emissions of paper-making enterprises, the larger the polluted area. The greater the damage to social welfare.

6. Simulation Research

The above sections have already analyzed the evolutionary game situation of the government environmental supervision department and the paper-making enterprises’ emission strategies under the static reward and punishment mechanism and the dynamic one. In this section, the results of the above evolutionary game are simulated and studied through the use of MATLAB to analyze the dynamic reward and punishment mechanism. What kind of strategic choices paper-making enterprise will make when facing different intensities of rewards and punishments. The details are as follows.

6.1. Analysis of the Evolution Path of the Two Sides of the Game under the Static and Dynamic Reward and Punishment Mechanisms
6.1.1. Analysis of the Evolution Path of the Two Sides of the Game under the Static Reward and Punishment Mechanism

This section needs to assign values to the above parameters. As far as paper-making enterprises are concerned, assume that the cost of paper-making enterprises is  = 4. When they do not discharge pollution and when they choose to discharge pollution, the cost is  = 3 and the illegal revenue obtained at this time is  = 6. As far as the government environmental supervision department is concerned, the cost of choosing a supervision strategy is  = 3. The penalty for paper-making enterprises for pollutant discharge is  = 6 and the reward for paper-making enterprises for not discharging pollutants is  = 4 (see Table 10).

The parameter values in Table 10 are satisfied and conditional, that is, and . According to the above parameter values, the probability that the government environmental supervision department implements supervision and the probability that the paper-making enterprise chooses not to discharge pollution can be calculated, among which, and . Taking the above probabilities as the initial probabilities of the evolutionary simulation study, when the probability of the government environmental supervision department choosing the supervision strategy is always the same, the probability that the paper-making enterprises choose not to discharge pollution is  = 0.2 and 0.8, respectively. At this time, a simulation diagram of the evolution of the paper-making enterprises’ choice of nondischarge strategy can be obtained. As shown in Figure 8, it can be seen from Figure 8 that in this case, although the government environmental supervision department has a high probability of choosing a supervisory strategy, the probability of paper-making enterprises’ choosing not to discharge pollutants fluctuates up and down and it does not tend to a stable point. In the same way, when the probability that the paper-making enterprises chooses not to discharge pollution remains unchanged, the probability of the government environmental supervision department choosing the supervision strategy is  = 0.2 and 0.8, respectively. At this time, the evolution simulation diagram of the government environmental supervision department choosing the supervision strategy can be obtained. Such as shown in Figure 9. It can also be seen from Figure 9 that the probability of government environmental supervision departments choosing supervision is also wave-shaped. If both parties to the game have the same initial value of probability, the game system will have a closed-loop evolution situation. So that the probability that the government environmental supervision department chooses supervision is equal to the probability that the paper-making enterprises chooses not to discharge pollution. Assuming it is 0.2 and the evolution path is shown in Figure 10, the above analysis shows that there is no stable point in the game between paper-making enterprises and government environmental supervision departments under the static reward and punishment mechanism. The fundamental reason for this situation is that both parties are bounded rationality, which is the rational choice of both parties after weighing benefits and costs.

6.1.2. Analysis of the Evolution Path of the Two Sides of the Game under the Dynamic Reward and Punishment Mechanism

According to the above analysis, there are many different situations under the dynamic reward and punishment mechanism. In order to simplify the length and compare with the static reward and punishment mechanism, this section still chooses and to analyze the situation. The parameters used in this section are shown in Table 11.

According to the above parameter values, the probability that the government environmental supervision department implements supervision and the probability that the paper-making enterprise chooses not to discharge pollution can be calculated. Among them, and . The above probability is also used as the initial probability of evolutionary simulation research. When the probability that the government environmental supervision department chooses the supervision strategy, is always the same. The probability of paper-making enterprise chooses not to discharge pollution is  = 0.2 and 0.8, respectively. At this time, a simulation diagram of the evolution of the paper-making enterprises’ choice of nondischarge strategy can be obtained, as shown in Figure 11. When the probability that the paper-making enterprise chooses not to discharge pollution, remains unchanged, and the probability of the government environmental supervision department choosing the supervision strategy is  = 0.2 and 0.8, respectively. At this time, the evolution simulation diagram of the government environmental supervision department choosing the supervision strategy can be obtained, as shown in Figure 12. If the two sides of the game have the same initial value of probability, the evolutionary game path is shown in Figure 13.

6.2. The Impact of Different Intensities of Rewards and Penalties on the Paper-Making Enterprises’ Pollution Emission Strategies
6.2.1. The Impact of Different Levels of Rewards on the Paper-Making Enterprises’ Pollution Emission Strategies

This section still follows the above assumptions. When other conditions remain unchanged, increasing the degree of reward for nonemissions will reduce the probability of choosing a nonemissions strategy, as shown in Figure 14. The possible reasons are as follows. First, increasing the rewards for paper-making enterprises for nondischarge will undoubtedly increase the supervision cost of the government’s environmental supervision department. With the increase of government costs, their willingness to cooperate will decrease. Second, too many rewards will easily make paper-making enterprises fall into “reward invalid” trap. After receiving the reward, the paper-making enterprises do not implement any effective actions to reduce emissions. But they will use the reward for other purposes. Therefore, it is necessary to control the rewards within a reasonable range because appropriate rewards are helpful to reduce the pollution of paper-making enterprises.

6.2.2. The Impact of Different Degrees of Punishment on the Paper-Making Enterprises’ Pollution Emission Strategies

It can be seen from Figure 15 that as the government’s environmental supervision department increases its penalties for paper-making enterprises, the probability of paper-making enterprises choosing not to discharge pollution has shown an increasing trend. The possible explanation is that increasing the punishment of paper-making enterprises will undoubtedly increase the cost of their violations. When faced with costs that are significantly higher than benefits, as the subject of bounded rationality, paper-making enterprises will be more inclined to choose nonemissions strategies. In summary, in terms of rewards and punishments, punishments are more effective than rewards because punishments increase the cost of paper-making enterprises whereas rewards increase the burden on the government’s environmental supervision department.

7. Conclusions and Suggestion

7.1. Conclusion

According to the above analysis, the dynamic reward and punishment mechanism is more effective than the static one. Under the dynamic reward and punishment mechanism, the evolution path of the government environmental supervision department and the paper-making enterprises will gradually converge to a point. While the evolution path of the game system between the two parties under the static reward and punishment mechanism shows wavy shocks. Under the static reward and punishment mechanism, although the probability of paper-making enterprises’ pollutant emission will change, it is essentially inclined to discharge pollutants. Under the dynamic reward and punishment mechanism, paper-making enterprises tend to have a stronger motivation not to discharge pollutants. Because once pollutants are discharged, they may face a huge amount of pollution punishment. Through the analysis of social welfare maximization, it can be seen that the amount of pollutant emissions of paper-making enterprises is inversely related to the rewards they receive. Specifically, the smaller the amount of paper-making enterprises’ pollutant emissions, the greater the rewards. The greater the emissions, the smaller the rewards. The greater the pollutant emissions of paper-making enterprises, the smaller the unpolluted area which in turn leads to the smaller the social welfare. The penalties for paper-making enterprises are positively related to their pollutant emissions. Specifically, the greater the amount of pollutants emitted by paper-making enterprises, the greater the penalty will be. Conversely, the smaller the amount of pollutant emissions from paper-making enterprises, the smaller the penalty. The larger the pollutant discharge of the paper-making enterprises, the larger the polluted area which in turn causes the greater the damage to social welfare. Through simulation research, it can be seen that the probability of paper-making enterprises’ pollution discharge under the static reward and punishment mechanism fluctuates. While the probability of paper-making enterprises’ pollution discharge under the dynamic reward and punishment mechanism is convergent. Moreover, it is better to increase the penalties for paper-making enterprises’ pollution discharge than to increase the rewards for nondischarge. Because as the punishment for illegal discharge of paper-making enterprises increases, they are more inclined to choose nondischarge strategies. As the incentives for nondischarge increase, the probability of choosing nondischarge decreases.

7.2. Suggestion

Based on the above conclusions, the author proposes some suggestions such as, to establish a dynamic reward and punishment mechanism, reasonably control the rewards and punishments for paper-making enterprises and control the cost of cooperation between the two parties in the game.

One is that a dynamic reward and punishment mechanism should be formulated when managing the environment. The so-called reward and punishment mechanism is not just a rigid form with two means of reward and punishment. It should be a dynamic and progressive system that includes identification, tracing, governance, and other traceable measures. The most basic of the mechanism system is identification. It is necessary to effectively determine the responsible subject and actively trace the source. The most important thing is governance. After the responsible subject is traced back, the reward and punishment mechanism is used to urge them to actively control environmental pollution.

The second is to reasonably control the rewards and punishments for paper-making enterprises. Changes in rewards and punishments will not only affect the paper-making enterprises’ pollution emission strategies but also cause changes in social welfare. Increasing the penalties for paper-making enterprises’ pollution emission will them to choose nondischarge strategies. Similarly, appropriately increasing the incentives for paper-making enterprises’ not discharging pollutants will also have the effect of reducing pollutant discharge. However, on the one hand, an excessively high reward will reduce the probability of paper-making enterprises choosing a nondischarge strategy and on the other hand, it will increase the burden on the government. Therefore, it is possible to increase the penalties for pollutant discharge of paper-making enterprises. But it is necessary to control the incentives for paper-making enterprises within a reasonable range.

The third is to control the cost of cooperation between the two parties in the game. Whether the two sides of the game choose a cooperation strategy fundamentally depends on the level of cooperation costs between the two parties. The purpose of establishing a dynamic reward and punishment mechanism is also to increase the cost of pollutant discharge by paper-making enterprises. With the increase in the cost of pollutant discharge, the paper-making enterprises, as the main body of the game with limited rationality, will inevitably consider their strategic choices carefully to achieve the purpose of reducing pollutant emissions. In addition, paper-making enterprises can increase the research and development of new technologies and use new technologies to treat sewage. However, they can reduce their own sewage treatment costs. At the same time, there are several methods, such as reducing the cost of supervision by the government’s environmental supervision department, establishing an Internet supervision system through the application of information technology and reducing the use of human capital and the cost of field research.

7.3. Discussion

The spirit of contract is not only a law of action that all parties should abide by but also an inevitable product of social cooperation. It is a constraint and regulation that promotes the evolution of cooperation between the two parties to a positive aspect. If such constraints and regulations are lost, not only the comprehensive interests of all parties will not be guaranteed but also it will also harm the overall interests of society. The design of the reward and punishment mechanism is to restrict the behaviors of all parties in the game. Under this constraint, it is expected that the cooperation of all parties can be more effective. This thesis analyzes the changes in social welfare based on the consideration of different types of government rewards and punishments mechanisms. However, the choice of pollutant emission strategies for paper-making enterprises is far more than these visible factors. There are also some other factors, such as: the success probability of the government’s reward and punishment mechanism and the enthusiasm of third parties to participate. These factors can be taken into account in the evolutionary game model in the follow-up research studies.

Data Availability

The data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

Acknowledgments

The authors thank the National Natural Science Foundation of China for supporting this research. This research was funded by National Natural Science Foundation Youth Project, grant number 71804013.

Supplementary Materials

The supporting material contains the MATLAB code and programs used in the article. (Supplementary Materials)