Multi-stream word-based compression algorithm Çoklu Akiş Destekli Kelime Tabanli Sikiştirma Algoritmasi


ÖZTÜRK E., MESUT A., DİRİ B.

2nd International Conference on Computer Science and Engineering, UBMK 2017, Antalya, Türkiye, 5 - 08 Ekim 2017, ss.34-37, (Tam Metin Bildiri) identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/ubmk.2017.8093552
  • Basıldığı Şehir: Antalya
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.34-37
  • Anahtar Kelimeler: Data compression, Text compression
  • Trakya Üniversitesi Adresli: Evet

Özet

In this article, we present a novel word-based lossless compression algorithm for text files which uses a semi-static model. We named our algorithm as Multi-stream Word-based Compression Algorithm (MWCA), because it stores the compressed forms of the words in three individual streams depending on their frequencies in the text. It also stores two dictionaries and a bit vector as a side information. In our experiments MWCA obtains compression ratio over 3,23 bpc on average and 2,88 bpc on files larger than 50 MB. If a variable length encoder like Huffman Coding is used after MWCA, given ratios will reduce to 2,63 and 2,44 bpc respectively. With the advantage of its multi-stream structure MWCA could become a good solution especially for storing and searching big text data.