Research Article
Spaced Seed Data Structures for De Novo Assembly
Figure 1
Uniqueness of spaced seeds in the (a) E. coli and (b) H. sapiens genomes, as a function of the space length. The red, blue, and black curves correspond to spaced seeds of lengths 8, 16, and 32 bp, respectively. When the space length is zero, the uniqueness figures correspond to 16, 32, and 64 bp single -mer lengths, respectively. Curves show that, for the E. coli genome, using a spaced seeds of length 16 is equivalent to or better than using -mers of length 64, when delta is longer than 100 bp.
(a) |
(b) |