What is the best compression algorithm for small 4 KB files?

Question

I am trying to compress TCP packets each one of about 4 KB in size. The packets can contain any byte (from 0 to 255). All of the benchmarks on compression algorithms that I found were based on larger files. I did not find anything that compares the compression ratio of different algorithms on small files, which is what I need. I need it to be open source so it can be implemented on C++, so no RAR for example. What algorithm can be recommended for small files of about 4 kilobytes in size? LZMA? HACC? ZIP? gzip? bzip2?

Is this because you want to optimise bandwidth usage? or is this a performance issue? If it's the former, then the best thing to do is try them all and see how they look. If it's the latter, you may find that sending the packets as is would be faster than the compress->send->decompress routine. — OJ.
OJ: Not necessarily... some environments are extremely bandwidth limited. If he's even concerned with compressing TCP packets, good chance he's operating in such an environment. — Eric J.
Moreover, there are many connections who have a cap on bandwidth total usage, so compressing the packets will help them save some bandwidth. — o0'.

Rick C. Petty Rick C. Petty · Accepted Answer · 2009-04-09T05:16:59

Choose the algorithm that is the quickest, since you probably care about doing this in real time. Generally for smaller blocks of data, the algorithms compress about the same (give or take a few bytes) mostly because the algorithms need to transmit the dictionary or Huffman trees in addition to the payload.

I highly recommend Deflate (used by zlib and Zip) for a number of reasons. The algorithm is quite fast, well tested, BSD licensed, and is the only compression required to be supported by Zip (as per the infozip Appnote). Aside from the basics, when it determines that the compression is larger than the decompressed size, there's a STORE mode which only adds 5 bytes for every block of data (max block is 64k bytes). Aside from the STORE mode, Deflate supports two different types of Huffman tables (or dictionaries): dynamic and fixed. A dynamic table means the Huffman tree is transmitted as part of the compressed data and is the most flexible (for varying types of nonrandom data). The advantage of a fixed table is that the table is known by all decoders and thus doesn't need to be contained in the compressed stream. The decompression (or Inflate) code is relatively easy. I've written both Java and Javascript versions based directly off of zlib and they perform rather well.

The other compression algorithms mentioned have their merits. I prefer Deflate because of its runtime performance on both the compression step and particularly in decompression step.

A point of clarification: Zip is not a compression type, it is a container. For doing packet compression, I would bypass Zip and just use the deflate/inflate APIs provided by zlib.

What is the best compression algorithm for small 4 KB files?

10 Answers