Research Article

Use of BP Neural Networks to Determine China’s Regional CO2 Emission Quota

Table 4

Advantages and disadvantages of activation functions.

NameAdvantagesDisadvantages

Sigmoid(1) It can smoothly map the real-number field to [0, 1](1) Gradient vanishing, that is, in the process of backpropagation, the derivative will gradually become 0; thus, the parameters cannot be updated and the neural network cannot be optimized
(2) Monotonically increasing, continuous derivable, and its derivative form is very simple(2) Nonzero-centered: the output value of the function is always greater than 0, which will slow down the convergence speed of the model training
(3) Suitable for handling binary classification problems(3) Exponentiation is relatively time-consuming

Softmax(1) It maps the output value to (0, 1), and the sum of the mapped output value is 1(1) The operation of Softmax involves the calculation of exponential function; in consequence, an “overflow problem” for computers occurs
(2) It divides the entire hyperspace according to the number of classifications
(3) Suitable for multiclassification problems(2) Not suitable for face recognition tasks

tanh(1) It can smoothly map the real-number field to [−1, 1](1) Gradient vanishing, that is, in the process of backpropagation, the derivative will gradually become 0; thus, the parameters cannot be updated and the neural network cannot be optimized
(2) Solve nonzero-centered problem
(3) Suitable for handling binary classification problems

ReLU(1) Solve the gradient vanishing in the positive interval(1) Nonzero-centered
(2) The calculation is simple, no exponential calculation is required, and the activation value can be obtained with only one value(2) Dead ReLU problem: it is “vulnerable” during training; when x < 0, the gradient is 0; the gradient of these nodes and subsequent nodes are always 0, and no longer responds to any data, causing the corresponding parameters to never be updated
(3) The convergence speed is much faster than sigmoid and tanh