Research Article

Two Efficient Techniques to Find Approximate Overlaps between Sequences

Figure 1

The compact prefix tree for strings = AGGT, = GGTC, = AATG, = GGTA, = TTAC, and = GGGC. The range above each node represents the reads which share the prefix up to this node. The value inside a node indicates the value of this node. For example, the range [4..5] indicates that the reads 4 and 5 share the prefix GGT. The prefix GGT can be obtained by concatenating all the labels of the edges starting from the root and ending with the node ([4..5]). Note that the second G in GGT is obtained from the text since = 1. The numbers inside the ranges are the new identifiers of the strings after sorting, not the original identifiers.