Research Article

Feature Selection for Longitudinal Data by Using Sign Averages to Summarize Gene Expression Values over Time

Table 4

Performance of the proposed methods and other relevant methods on Simulation II.

MethodSizeRandF13A1GSTM1Error rate1BCM2AUPR3
(%)(%)(%)(%)

Sign Avg. & LASSO/CD413.8210.03100961.270.8540.994
Sign Avg. & TGDR59.9214.78100963.330.8410.993
EDGE202.72007.370.7550.973
limma8.99.7501005.230.8090.981
LASSO/CD separately615.888.81981006.600.6680.982
TGDR separately775.483.381001004.470.7140.991
glmmLASSO63.521.634846.770.5100.551

Using q-value as the cutoff, EDGE selects all 1,000 genes as significant. We used the 20 most significant genes instead. 1Error rate = (false positives + false negatives)/(sample size).
2BCM captures the average confidence that a sample belongs to class i when it indeed belongs to that class.
3AUPR is computed as the average of the for each class and captures the ability of correctly ranking the samples known to belong in a given class.
4Sign Avg. & LASSO/CD: Pseudogenes were obtained by calculating the sign average of a gene’s expression values across time; the optimization method is coordinated descent.
5Sign Avg. & TGDR: Pseudogenes were obtained by calculating the sign average of a gene’s expression values across time; the optimization method is threshold gradient descent regularization.
6LASSO/CD separately: separate LASSO models were trained at individual time points; the optimization method is CD.
7TGDR separately: separate TGDR models were trained at individual time points; the optimization method is TGDR.