Research Article
n-Gram-Based Text Compression
Algorithm 1
Pseudocode of the compression phase.
input: The source text file | output: The encoded stream | () inputstring = read source text file | () count = number of grams in the inputstring | () while do | () st5 = get first five grams of the inputstring | () index = find(st5, five_gram_dict) | () if then | () force_four_gram_compression(st4) | () outputstring += compress(index, 5) | () delete first five grams of the inputstring | () count −= 5 | () end | () else | () st4 += get first gram of the inputstring | () delete first gram of the inputstring | () count −= 1 | () if number of grams of st4 = 4 then | () four_gram_compression(st4) | () end | () end | () end | (21) if then | () four_gram_compression(inputstring) | () end |
|