Research Article

AdaCN: An Adaptive Cubic Newton Method for Nonconvex Stochastic Optimization

Figure 1

Trajectories of SGD, Adam, Apollo, and AdaCN on a convex function (a) and a nonconvex function (b), respectively. Parameters of two experiments are set as follows: the learning rates of AdaCN, Apollo, Adam, and SGD are set to , , , and , respectively. For SGD and Apollo, , whereas for AdaCN and Adam, and . The model is trained for epochs. (a) Loss function is . (b).
(a)
(b)