Research Article

Two-Phase Iteration for Value Function Approximation and Hyperparameter Optimization in Gaussian-Kernel-Based Adaptive Critic Design

Algorithm 1

Gaussian-kernel-based Approximate Dynamic Programming.
Initialize:
    : hyperparameters of Gaussian kernel model
    : sample set
    : initial policy
    , : learning step size
Let = 0;
Loop:
      k = 1     
     = t + 1;
    
    Get the reward
    Observe next state
    Update according to (12)
    Update the policy according to optimum seeking
      
    Update according to (12)
Until the termination criterion is satisfied