Research Article

Feature Selection for Longitudinal Data by Using Sign Averages to Summarize Gene Expression Values over Time

Table 2

Performance of the proposed method on the traumatic injury application and comparison with other methods.

MethodSizeRand IndexTest Set
Error rateBCM1AUPR2

Proposed methodsSign Avg. & LASSO/CD33219.58%0.3510.6050.626
Sign Avg. & TGDR43025.21%0.3780.5900.662

Existing methodslimma4721.40%0.4320.5420.628
EDGE45313.67%0.4320.5430.622
glmmLASSO834.99%0.4320.5190.532
LASSO/CD separately52813.59%0.4860.4980.508
TGDR separately613322.58%0.3780.5200.579
Mean & LASSO/CD72917.95%0.4050.5360.560

Using other summary scoresMean & TGDR83627.37%0.4050.5620.617
Median & LASSO/CD9227.76%0.3510.5430.617
Median & TGDR104318.58%0.4050.5780.626
PC1 & LASSO/CD11313.59%0.4050.5040.541
PC1 & TGDR122932.68%0.4320.5390.548

1BCM captures the average confidence that a sample belongs to class i when it indeed belongs to that class;
2AUPR is the average of for each class and it captures the ability of correctly ranking the samples known to belong in a given class;
3Sign Avg. & LASSO/CD: pseudo genes were obtained by calculating the sign average of a gene’s expression values across time, and the feature selection method is LASSO in which the optimization method used is coordinate descent;
4Sign Avg. & TGDR: pseudo genes were obtained by calculating the sign average of a gene’s expression values across time, and the feature selection/optimization method is threshold gradient descent regularization;
5LASSO/CD separately: separate LASSO models were trained at individual time points; the optimization method is CD;
6TGDR separately: separate TGDR models were trained at individual time points; the optimization method is TGDR;
7Mean & LASSO/CD: pseudo genes were obtained by calculating the average of a gene’s expression values across time, and the optimization method is CD;
8Mean & TGDR: pseudo genes were obtained by calculating the average of a gene’s expression values across time, and the optimization method is TGDR;
9Median & LASSO/CD: pseudo genes were obtained by calculating the median of a gene’s expression values across time, and the optimization method is CD;
10Median & TGDR: pseudo genes were obtained by calculating the median of a gene’s expression values across time, and the optimization method is TGDR;
11PC1 & LASSO/CD: pseudo genes obtained by calculating the first principal component of a gene’s expression values across time, and the optimization method is CD;
12PC1 & TGDR: pseudo genes were obtained by calculating the first principal component of a gene’s expression values across time, and the optimization method is TGDR.