Research Article

Feature Selection for Longitudinal Data by Using Sign Averages to Summarize Gene Expression Values over Time

Table 3

Performance of the proposed methods and other relevant methods on Simulation I.

MethodSizeRandF13A1GSTM1Error rate1BCM2AUPR3
(%)(%)(%)(%)

Sign Avg. & LASSO/CD45.5213.78701022.970.5820.873

Sign Avg. & TGDR516.768.12881006.770.7240.987

EDGE203.8516010.800.7190.936

limma6.0411.72810016.170.7070.908

LASSO/CD separately64.6529.17364030.000.5270.924

TGDR separately732.265.3010010019.270.6110.991

glmmLASSO114.063.050036.400.5190.571

Using q-value as the cutoff, EDGE selects all 1,000 genes as significant. We used the 20 most significant genes instead. 1Error rate = (false positives + false negatives)/(sample size).
2BCM captures the average confidence that a sample belongs to class i when it indeed belongs to that class.
3AUPR is computed as the average of the for each class and captures the ability of correctly ranking the samples known to belong in a given class.
4Sign Avg. & LASSO/CD: Pseudogenes were obtained by calculating the sign average of a gene’s expression values across time; the optimization method is coordinated descent.
5Sign Avg. & TGDR: Pseudogenes were obtained by calculating the sign average of a gene’s expression values across time; the optimization method is threshold gradient descent regularization.
6LASSO/CD separately: separate LASSO models were trained at individual time points; the optimization method is CD.
7TGDR separately: separate TGDR models were trained at individual time points; the optimization method is TGDR.