Research Article
A Practical and Scalable Tool to Find Overlaps between Sequences
Table 2
Data sets used in experiments.
| | Data set | Size | Number of strings |
| | Generated randomly using a uniform distribution | 10 MB–50 GB | 104–66 × 107 | | First fully public female human genome (SRR098909) | 32.7 G | 162 M | | Illumina whole human genome (SRR866986) | 9.8 G | 53 M | | A study in rat genome (ERR125766) | 5 G | 97 M | | Homo sapiens | 1.1 G | 15 M | | Exome (SRR500004) | | EST of C. elegans | 167 MB | 334,465 | | EST of Citrusclementina | 104 MB | 118,365 | | EST of Citrussinensis | 154 MB | 208,909 | | EST of Citrustrifoliata | 46 MB | 62,344 | | EST of Attacephalotes | 278 MB | 2,835 |
|
|