Research Article

Two Efficient Techniques to Find Approximate Overlaps between Sequences

Table 1

Data sets used in experiments.

Data Set Size # of strings

Random data 1 MB5 MB
Homo sapiens exome (SRR500004) 1.1 GB 15 M
E. coli (SRR2244250) 302 MB 502,172
C. elegans 167 MB 334,465
Citrus clementina 104 MB 118,365
Citrus sinensis 154 MB 208,909
Citrus trifoliata 46 MB 62,344