I need to choose a compression algorithm

Question

I need to choose a compression algorithm to compress some data. I don't know the type of data I'll be compressing in advance (think of it as kinda like the WinRAR program).

I've heard of the following algorithms but I don't know which one I should use. Can anyone post a short list of pros and cons? For my application the first priority is decompression speed; the second priority is space saved. Compression (not decompression) speed is irrelevant.

Deflate
Implode
Plain Huffman
bzip2
lzma

Some languages have built-in support for some (maybe all) of these, so you could do some quick testing. I guess this is hard if you don't know the type of data you're going to be compressing, but hopefully you have some idea, or some way of randomly generating data that is close to what you'll be using, in some way. — MatrixFrog

Thomas Bonini Thomas Bonini · Accepted Answer · 2010-03-07T20:05:02

I ran a few benchmarks compressing a .tar that contained a mix of high entropy data and text. These are the results:

Name  - Compression rate* - Decompression Time
7zip  - 87.8%             - 0.703s
bzip2 - 80.3%             - 1.661s
gzip  - 72.9%             - 0.347s
lzo   - 70.0%             - 0.111s

*Higher is better

From this I came to the conclusion that the compression rate of an algorithm depends on its name; the first in alphabetical order will be the one with the best compression rate, and so on.

Therefore I decided to rename lzo to 1lzo. Now I have the best algorithm ever.

EDIT: worth noting that of all of them unfortunately lzo is the only one with a very restrictive license (GPL) :(

I need to choose a compression algorithm

5 Answers