Research Article

n-Gram-Based Text Compression

Algorithm 1

Pseudocode of the compression phase.
  input: The source text file
  output: The encoded stream
() inputstring = read source text file
() count = number of grams in the inputstring
() while    do
()    st5 = get first five grams of the inputstring
()    index = find(st5, five_gram_dict)
()    if    then
()       force_four_gram_compression(st4)
()       outputstring += compress(index, 5)
()       delete first five grams of the inputstring
()     count −= 5
()  end
()  else
()     st4 += get first gram of the inputstring
()     delete first gram of the inputstring
()     count −= 1
()     if  number of grams of st4 = 4  then
()      four_gram_compression(st4)
()     end
()    end
() end
(21)  if    then
()   four_gram_compression(inputstring)
()  end