Research Article
Two-Phase Iteration for Value Function Approximation and Hyperparameter Optimization in Gaussian-Kernel-Based Adaptive Critic Design
Algorithm 1
Gaussian-kernel-based Approximate Dynamic Programming.
Initialize: | : hyperparameters of Gaussian kernel model | : sample set | : initial policy | , : learning step size | Let = 0; | Loop: | k = 1 | = t + 1; | | Get the reward | Observe next state | Update according to (12) | Update the policy according to optimum seeking | | Update according to (12) | Until the termination criterion is satisfied |
|