0
votes

I want to compress .txt files that contains dates in yyyy-mm-dd hh:mm:ss format and english words that sometimes tend to be repeated in different lines.
I read some articles about compression algorithm and find out that in my case dictionary based encoding is better than entropy based encoding. Since I want to implement algorithm myself I need something that isn't very complicated. So I paid attention to LZW and LZ77, but can't choose between them, because conclusions of articles I found are contradictory. According to some articles LZW has better compression ratio and according to others leader is LZ77. So the question is which one is most likely will be better in my case? Is there more easy-to-implement algorithms that can be good for my purpose?

1
Experiment with readily accessible implementations. Does each file have to be decompressible individually? Time stamps and words looks a bit like log files - look for special solutions. Experiment with converting the time stamps to a more compact representation: 32 bits of seconds cover more than 136 years.greybeard

1 Answers

2
votes

LZW is obsolete. Modern, and even pretty old, LZ77 compressors outperform LZW.

In any case, you are the only one who can answer your question, since only you have examples of the data you want to compress. Simply experiment with various compression methods (zstd, xz, lz4, etc.) on your data and see what combination of compression ratio and speed meets your needs.