BioMed Research International

Research Article

Feature Selection for Longitudinal Data by Using Sign Averages to Summarize Gene Expression Values over Time

Table 3

Performance of the proposed methods and other relevant methods on Simulation I.


Method	Size	Rand	F13A1	GSTM1	Error rate¹	BCM²	AUPR³
Method	Size	(%)	(%)	(%)	(%)	BCM²	AUPR³

Sign Avg. & LASSO/CD⁴	5.52	13.78	70	10	22.97	0.582	0.873

Sign Avg. & TGDR⁵	16.76	8.12	88	100	6.77	0.724	0.987

EDGE	20	3.85	16	0	10.80	0.719	0.936

limma	6.04	11.72	8	100	16.17	0.707	0.908

LASSO/CD separately⁶	4.65	29.17	36	40	30.00	0.527	0.924

TGDR separately⁷	32.26	5.30	100	100	19.27	0.611	0.991

glmmLASSO	114.06	3.05	0	0	36.40	0.519	0.571

Using q-value as the cutoff, EDGE selects all 1,000 genes as significant. We used the 20 most significant genes instead. ¹Error rate = (false positives + false negatives)/(sample size).
²BCM captures the average confidence that a sample belongs to class i when it indeed belongs to that class.
³AUPR is computed as the average of the for each class and captures the ability of correctly ranking the samples known to belong in a given class.
⁴Sign Avg. & LASSO/CD: Pseudogenes were obtained by calculating the sign average of a gene’s expression values across time; the optimization method is coordinated descent.
⁵Sign Avg. & TGDR: Pseudogenes were obtained by calculating the sign average of a gene’s expression values across time; the optimization method is threshold gradient descent regularization.
⁶LASSO/CD separately: separate LASSO models were trained at individual time points; the optimization method is CD.
⁷TGDR separately: separate TGDR models were trained at individual time points; the optimization method is TGDR.