5
votes

Is zLib Worth it? Are there other better suited compressors?

I am using an embedded system. Frequently, I have only 3MB of RAM or less available to my application. So I am considering using zlib to compress my buffers. I am concerned about overhead however.

The buffer's average size will be 30kb. This probably won't get compressed by zlib. Anyone know of a good compressor for extremely limited memory environments?

However, I will experience occasional maximum buffer sizes of 700kb, with 500kb much more common. Is zlib worth it in this case? Or is the overhead too much to justify?

My sole considerations for compression are RAM overhead of algorithm and performance at least as good as zlib.

LICENSE: I prefer the compressor be licensed under BSD, zLib, or equivalent license.

3
When you say "performance", do you mean speed, or compression ratios? Will you be compressing, decompressing, or both in the embedded system? Do you care more about compression performance, or decompression performance?Craig McQueen
Both. And by "performance" I mean RAM use / overhead.unixman83

3 Answers

5
votes

If you initialize zlib with lm_init() with 1, 2, or 3, the deflate_fast() routine will be used instead of deflate(), which will use smaller runtime buffers and faster algorithms. The tradeoff is worse compression. It is probably worth it.

If you compile zlib with SMALL_MEM defined, it will use smaller hash buckets when hashing input strings. The documentation (in deflate.c) claims:

/* Compile with MEDIUM_MEM to reduce the memory requirements or
 * with SMALL_MEM to use as little memory as possible. Use BIG_MEM if the
 * entire input file can be held in memory (not possible on 16 bit systems).
 * Warning: defining these symbols affects HASH_BITS (see below) and thus
 * affects the compression ratio. The compressed output
 * is still correct, and might even be smaller in some cases.
 */

Hopefully, these two techniques combined can bring zlib into range with your application. It's a ubiquitous standard, and being able to re-use well-worn components may be worth sacrifices elsewhere in the application. But if you know something about the distribution of your data that allows you to write your own compression routines, you may be able to do better, but you can drop zlib in place quickly -- writing and testing your own might take more time.

Update

Here's some output on a zlib built with SMALL_MEM, using different compression level settings, on the first 600k file I found:

$ ls -l abi-2.6.31-14-generic
-rw-r--r-- 1 sarnold sarnold 623709 2011-03-18 18:09 abi-2.6.31-14-generic
$ for i in `seq 1 9` ; do /usr/bin/time ./gzip -c -${i} abi-2.6.31-14-generic | wc -c ; done
0.02user 0.00system 0:00.02elapsed 76%CPU (0avgtext+0avgdata 2816maxresident)k
0inputs+0outputs (0major+213minor)pagefaults 0swaps
162214
0.01user 0.00system 0:00.01elapsed 52%CPU (0avgtext+0avgdata 2800maxresident)k
0inputs+0outputs (0major+212minor)pagefaults 0swaps
158817
0.02user 0.00system 0:00.02elapsed 95%CPU (0avgtext+0avgdata 2800maxresident)k
0inputs+0outputs (0major+212minor)pagefaults 0swaps
156708
0.02user 0.00system 0:00.02elapsed 76%CPU (0avgtext+0avgdata 2784maxresident)k
0inputs+0outputs (0major+211minor)pagefaults 0swaps
143843
0.03user 0.00system 0:00.03elapsed 96%CPU (0avgtext+0avgdata 2784maxresident)k
0inputs+0outputs (0major+212minor)pagefaults 0swaps
140706
0.03user 0.00system 0:00.03elapsed 81%CPU (0avgtext+0avgdata 2784maxresident)k
0inputs+0outputs (0major+211minor)pagefaults 0swaps
140126
0.04user 0.00system 0:00.04elapsed 95%CPU (0avgtext+0avgdata 2784maxresident)k
0inputs+0outputs (0major+211minor)pagefaults 0swaps
138801
0.05user 0.00system 0:00.05elapsed 84%CPU (0avgtext+0avgdata 2784maxresident)k
0inputs+0outputs (0major+212minor)pagefaults 0swaps
138446
0.06user 0.00system 0:00.06elapsed 96%CPU (0avgtext+0avgdata 2768maxresident)k
0inputs+0outputs (0major+210minor)pagefaults 0swaps
138446

The entire gzip program takes around 2.6 megabytes of memory, regardless of the compression level asked for; perhaps just using the specific functions you need rather than the entire gzip program would bring that number down some, but it might be too expensive for your little machine.

4
votes

Have a look at LZO.

From the documentation:

  • Requires no memory for decompression.
  • Requires 64 kB of memory for compression.

If you cleverly arrange your data, you can do an overlapping (in-place) decompression which means that you can decompress to the same block where the compressed data resides.

You can also partly overlay the buffers when doing compression.

2
votes

LZS is a very simple sliding-window compressor and decompressor, specified for use in various Internet protocols. It could be a good technical solution.

I've written some C and Python code for LZS compression and decompression.