Complexity

Research Article

Use of BP Neural Networks to Determine China’s Regional CO₂ Emission Quota

Advantages and disadvantages of gradient descent algorithms.


Name	Advantages	Disadvantages

BGD	The principle of gradient descent is simple	(1) Calculation is very slow
		(2) Difficult to handle a large dataset
		(3) Cannot add new data to update the model

SGD	(1) Compared with BGD, SGD training speed is faster	(1) Frequent updates may cause severe oscillations in the loss function
	(2) New data can be added to update the model

Momentum	(1) Consider the speed of the previous step and the new gradient
	(2) Can speed up the convergence and suppress the shock

AdaGrad	(1) Compared with SGD, it adds a denominator	(1) If gradient update is frequent, it may cause the subsequent gradient updates be slow or disappear
	(2) Handle the case where the number of gradient updates is small

RMSprop	(1) Similar to momentum, it can reduce fluctuations
	(2) Overcome the problem of the sharp decrease or disappearance of gradient in AdaGrad
	(3) It performs better than SGD, momentum, and AdaGrad, based on the nonstationary objective function

Adam	(1) Combine momentum and RMSProp
	(2) Integrate the contents of gradient descent, momentum, Adagrad, and RMSprop with certain improvement
	(3) Easy to use, insensitive to gradient scaling, can be used for large data, processing sparse data, easy to adjust hyperparameters, etc.