Research Article
[Retracted] Gradient Descent Optimization in Deep Learning Model Training Based on Multistage and Method Combination Strategy
Table 8
Performance of the proposed method, SGD.
| | ResNet-20 on Cafri-10 | LSTM on IMDB | Val-loss | Val-acc | Val-loss | Val-acc |
| SGD + SGD | 1.0178 | 0.6948 | 0.6919 | 0.5570 | SGD + (SGD + M) | 1.0763 | 0.7134 | 0.4408 | 0.7971 | SGD + (SGD + d) | 0.9607 | 0.7168 | 0.6890 | 0.5777 | SGD + (SGD + M + d) | 0.9040 | 0.7557 | 0.4353 | 0.7982 | SGD + RMSprop | 0.9408 | 0.7419 | 0.4287 | 0.8367 | SGD + (RMSprop + d) | 1.0131 | 0.7298 | 0.4342 | 0.8237 | SGD + Adam | 0.8751 | 0.7641 | 0.9210 | 0.8100 | SGD + (Adam + d) | 1.0692 | 0.7274 | 0.8172 | 0.8130 |
|
|
m: Momentum, D: decay by 1e − 6 every iteration, and “()”: take methods at the same timepiece. The bold values represent the best results.
|