Research Article

A SuperLearner Approach to Predict Run-In Selection in Clinical Trials

Table 2

Base learner used for each SL trained; risk (average value of MSE in the Cross-Validation procedure) and coefficient (weight of the base learner convex combination used to form the SL) are reported. Weights equal to zero are omitted. The algorithm composing the SL is identified; the average indicates the SL average ensemble prediction algorithm. The screening (feature selection) algorithm has been also identified. For example, “SL, Mars Algorithm, RF screened features” identify the risk associated with the Mars algorithm within SL ensemble with an RF-based feature selection procedure.

SL trained on study A – PlaceboRiskCoefficient
 SL, Mars Algorithm, all features0.1770.213
 SL, Mars Algorithm, RF screened features0.1610.257
 SL, average, all features0.1390.311
 SL, Rpart, RF screened features0.1500.219

SL trained on study A – VerumRiskCoefficient
 SL, average, all features0.1210.539
 SL, Polymars, RF screened features0.1310.410
 SL, RF, RF screened features0.1320.051

SL trained on study B – PlaceboRiskCoefficient
 SL, Mars Algorithm, all features0.0990.170
 SL, Glmnet Algorithm, all features0.0820.119
 SL, Glmnet Algorithm, RF screened features0.0750.298
 SL, average, all features0.1270.015
 SL, RF, RF screened features0.0760.398

SL trained on study B - VerumRiskCoefficient
 SL, Rpart, all features0.1260.124
 SL, average, all features0.1270.523
 SL, Polymars, RF screened features0.1910.141
 SL, RF, RF all features0.1260.213

Abbreviations: SL = SuperLearner; RF = Random Forest; Glmnet = Lasso and Elastic-Net Regularized 329 Generalized Linear Models; Mars = Multivariate Adaptive Regression Splines; Polymars = Poly-330 chotomous classification based on Multivariate Adaptive Regression Splines; Rpart = Recursive Par-331 titioning Trees.