|
| Name | Advantages | Disadvantages |
|
| Sigmoid | (1) It can smoothly map the real-number field to [0, 1] | (1) Gradient vanishing, that is, in the process of backpropagation, the derivative will gradually become 0; thus, the parameters cannot be updated and the neural network cannot be optimized |
| (2) Monotonically increasing, continuous derivable, and its derivative form is very simple | (2) Nonzero-centered: the output value of the function is always greater than 0, which will slow down the convergence speed of the model training |
| (3) Suitable for handling binary classification problems | (3) Exponentiation is relatively time-consuming |
|
| Softmax | (1) It maps the output value to (0, 1), and the sum of the mapped output value is 1 | (1) The operation of Softmax involves the calculation of exponential function; in consequence, an “overflow problem” for computers occurs |
| (2) It divides the entire hyperspace according to the number of classifications |
| (3) Suitable for multiclassification problems | (2) Not suitable for face recognition tasks |
|
| tanh | (1) It can smoothly map the real-number field to [−1, 1] | (1) Gradient vanishing, that is, in the process of backpropagation, the derivative will gradually become 0; thus, the parameters cannot be updated and the neural network cannot be optimized |
| (2) Solve nonzero-centered problem |
| (3) Suitable for handling binary classification problems |
|
| ReLU | (1) Solve the gradient vanishing in the positive interval | (1) Nonzero-centered |
| (2) The calculation is simple, no exponential calculation is required, and the activation value can be obtained with only one value | (2) Dead ReLU problem: it is “vulnerable” during training; when x < 0, the gradient is 0; the gradient of these nodes and subsequent nodes are always 0, and no longer responds to any data, causing the corresponding parameters to never be updated |
| (3) The convergence speed is much faster than sigmoid and tanh |
|