| Input: |
| –– Initial training dataset ; |
| –– Size of random subset at each iteration; |
| –– Number of hidden nodes; |
| –– Number of iterations; |
| ––Loss function which is twice differentiable; |
| –– Regularization factor, . |
| Use to train the initial written as , where the |
| input weights and hidden biases are randomly selected |
| within the range of and the output-layer weights are |
| determined analytically by , and record the initial |
| base learner ; |
| for do |
| Randomly generate a permutation of the integers , |
| and then a stochastic subset of the whole training dataset is defined |
| as ; |
| Calculate the first order gradient statistics on the loss function |
| with regard to the predicted output of the current ensemble model |
| for each training instance in the subset as |
| ; |
| Calculate the second order gradient statistics on the loss function |
| with regard to the predicted output of current ensemble model for each |
| training instance in the subset as |
| ; |
| For the training instances in the subset, compute the current |
| pseudo residuals , where |
| ; |
| Determine the output weights used as a heuristic item |
| for the derivation formula based on the modified training |
| dataset as follows |
| , |
| where is calculated according to the randomly selected |
| input weights and hidden biases ; |
| Use the derivation formula in Algorithm 1 to obtain the optimal |
| output-layer weights of ; |
| Add the -th individual learner () to |
| the current ensemble learning model as |
| ; |
| end for |
| output: |
| The final ensemble model . |