Computational and Mathematical Methods in Medicine

Research Article

A SuperLearner Approach to Predict Run-In Selection in Clinical Trials

Table 2

Base learner used for each SL trained; risk (average value of MSE in the Cross-Validation procedure) and coefficient (weight of the base learner convex combination used to form the SL) are reported. Weights equal to zero are omitted. The algorithm composing the SL is identified; the average indicates the SL average ensemble prediction algorithm. The screening (feature selection) algorithm has been also identified. For example, “SL, Mars Algorithm, RF screened features” identify the risk associated with the Mars algorithm within SL ensemble with an RF-based feature selection procedure.


SL trained on study A – Placebo	Risk	Coefficient
SL, Mars Algorithm, all features	0.177	0.213
SL, Mars Algorithm, RF screened features	0.161	0.257
SL, average, all features	0.139	0.311
SL, Rpart, RF screened features	0.150	0.219

SL trained on study A – Verum	Risk	Coefficient
SL, average, all features	0.121	0.539
SL, Polymars, RF screened features	0.131	0.410
SL, RF, RF screened features	0.132	0.051

SL trained on study B – Placebo	Risk	Coefficient
SL, Mars Algorithm, all features	0.099	0.170
SL, Glmnet Algorithm, all features	0.082	0.119
SL, Glmnet Algorithm, RF screened features	0.075	0.298
SL, average, all features	0.127	0.015
SL, RF, RF screened features	0.076	0.398

SL trained on study B - Verum	Risk	Coefficient
SL, Rpart, all features	0.126	0.124
SL, average, all features	0.127	0.523
SL, Polymars, RF screened features	0.191	0.141
SL, RF, RF all features	0.126	0.213

Abbreviations: SL = SuperLearner; RF = Random Forest; Glmnet = Lasso and Elastic-Net Regularized 329 Generalized Linear Models; Mars = Multivariate Adaptive Regression Splines; Polymars = Poly-330 chotomous classification based on Multivariate Adaptive Regression Splines; Rpart = Recursive Par-331 titioning Trees.