BioMed Research International

Research Article

Feature Selection for Longitudinal Data by Using Sign Averages to Summarize Gene Expression Values over Time

Table 4

Performance of the proposed methods and other relevant methods on Simulation II.


Method	Size	Rand	F13A1	GSTM1	Error rate¹	BCM²	AUPR³
Method	Size	(%)	(%)	(%)	(%)	BCM²	AUPR³

Sign Avg. & LASSO/CD⁴	13.82	10.03	100	96	1.27	0.854	0.994
Sign Avg. & TGDR⁵	9.92	14.78	100	96	3.33	0.841	0.993
EDGE	20	2.72	0	0	7.37	0.755	0.973
limma	8.9	9.75	0	100	5.23	0.809	0.981
LASSO/CD separately⁶	15.88	8.81	98	100	6.60	0.668	0.982
TGDR separately⁷	75.48	3.38	100	100	4.47	0.714	0.991
glmmLASSO	63.52	1.63	4	8	46.77	0.510	0.551

Using q-value as the cutoff, EDGE selects all 1,000 genes as significant. We used the 20 most significant genes instead. ¹Error rate = (false positives + false negatives)/(sample size).
²BCM captures the average confidence that a sample belongs to class i when it indeed belongs to that class.
³AUPR is computed as the average of the for each class and captures the ability of correctly ranking the samples known to belong in a given class.
⁴Sign Avg. & LASSO/CD: Pseudogenes were obtained by calculating the sign average of a gene’s expression values across time; the optimization method is coordinated descent.
⁵Sign Avg. & TGDR: Pseudogenes were obtained by calculating the sign average of a gene’s expression values across time; the optimization method is threshold gradient descent regularization.
⁶LASSO/CD separately: separate LASSO models were trained at individual time points; the optimization method is CD.
⁷TGDR separately: separate TGDR models were trained at individual time points; the optimization method is TGDR.