4
votes

I have an application that uses external cache for some data (specifically, memcached on another server). There's an option to compress the data with zlib before caching. The question is - which data size makes it worthy to compress? E.g., if we have 10-byte data item, it's probably useless to waste time on compressing/decompressing it. But if we had 10K of data, it may be worth it. The data stored will be mostly ASCII strings.

I know that depends a lot on network speed, CPU speed, data and what not, but are there any guidelines or heuristics? Doesn't have to be perfect but if it can save some cycles it would be great.

1
I was thinking compressing data smaller than network packet probably isn't worth it since sending it would take roughly the same time... I wonder if it's right?StasM

1 Answers

2
votes

Zlib's deflate have extremely small size of block headers (4 bits). http://www.gzip.org/zlib/rfc-deflate.html section 3.2.3

It can store block uncompressed or compress it with fixed huffmann table, so it is unlikely that your data will be expanded a lot even if it is very short.

UPDATE:

There is a project smaz https://github.com/antirez/smaz for compressing short strings (naive one), and author says,

think that like zlib will usually not be able to compress text shorter than 100 bytes.

For speed. May be you should write a small benchmark program. I can find this study http://pytables.github.com/usersguide/optimization.html and there are interesting figures: speed of writing short records with different compression (no, zlib, lzo, bzip2); and reading short records.

Zlib is 5 times slower than uncompressed at write and up to 8 times slower at read. Also, lzo performs better in this evaluation.