Research Article
n-Gram-Based Text Compression
Algorithm 1
Pseudocode of the compression phase.
| input: The source text file | | output: The encoded stream | | () inputstring = read source text file | | () count = number of grams in the inputstring | | () while do | | () st5 = get first five grams of the inputstring | | () index = find(st5, five_gram_dict) | | () if then | | () force_four_gram_compression(st4) | | () outputstring += compress(index, 5) | | () delete first five grams of the inputstring | | () count −= 5 | | () end | | () else | | () st4 += get first gram of the inputstring | | () delete first gram of the inputstring | | () count −= 1 | | () if number of grams of st4 = 4 then | | () four_gram_compression(st4) | | () end | | () end | | () end | | (21) if then | | () four_gram_compression(inputstring) | | () end |
|