In zlib programming, will the CHUNK size affect the compressed file size?

Question

I use C programming language in Linux platform. I refer to the zlib usage example on zlib's official website (http://www.zlib.net/zlib_how.html) and write a compression program. Note that my compression method is gzip, which means using the deflateint2() function instead of deflateinit().

According to zlib's website,"CHUNK is simply the buffer size for feeding data to and pulling data from the zlib routines. Larger buffer sizes would be more efficient, especially for inflate(). If the memory is available, buffers sizes on the order of 128K or 256K bytes should be used. " So I think the bigger the CHUNK, the smaller the compressed file will be and the faster the compression speed will be.

But when I tested my program, I found that no matter the CHUNK size is 16384 or 1, the compressed file size is same (16384 is a typical value given by zlib official routine). The difference is that when the chunk size is 1, the compression speed is much slower.

This result makes me very confused. I think when the CHUNK size is 1, the compression processing is invalid. Because in this routine, each input CHUNK will be processed and output to a compressed file directly, and I think 1 byte of data cannot be compressed.

So my question is, why does the CHUNK size only affect the compression speed, but not the compression ratio?

Here's my program:

#define CHUNK 16384
int def(FILE *source, FILE *dest, int level, int memLevel)
{
    int ret, flush;
    unsigned have;
    z_stream strm;
    unsigned char in[CHUNK];
    unsigned char out[CHUNK];

    /* allocate deflate state */
    strm.zalloc = Z_NULL;
    strm.zfree = Z_NULL;
    strm.opaque = Z_NULL;
    ret = deflateInit2(&strm, level, Z_DEFLATED, MAX_WBITS + 16, memLevel, Z_DEFAULT_STRATEGY);
    if (ret != Z_OK)
        return ret;

    /* compress until end of file */
    do {
        strm.avail_in = fread(in, 1, CHUNK, source);
        if (ferror(source)) {
            (void)deflateEnd(&strm);
            return Z_ERRNO;
        }
        flush = feof(source) ? Z_FINISH : Z_NO_FLUSH;
        strm.next_in = in;

        /* run deflate() on input until output buffer not full, finish
           compression if all of source has been read in */
        do {
            strm.avail_out = CHUNK;
            strm.next_out = out;
            ret = deflate(&strm, flush);    /* no bad return value */
            assert(ret != Z_STREAM_ERROR);  /* state not clobbered */
            have = CHUNK - strm.avail_out;
            if (fwrite(out, 1, have, dest) != have || ferror(dest)) {
                (void)deflateEnd(&strm);
                return Z_ERRNO;
            }
        } while (strm.avail_out == 0);
        assert(strm.avail_in == 0);     /* all input will be used */

        /* done when last data in file processed */
    } while (flush != Z_FINISH);
    assert(ret == Z_STREAM_END);        /* stream will be complete */

    /* clean up and return */
    (void)deflateEnd(&strm);
    return Z_OK;
}

It affects compression speed because you give more data at a time so you don't have to loop as much. The level argument is more important for compressed size. — Shawn

Mark Adler Mark Adler · Accepted Answer · 2021-05-28T07:18:06

Because deflate internally buffers the data for compression. Regardless of how you feed the data to deflate, it accumulates and compresses bytes until it has enough to emit a deflate block.

You are correct that you cannot compress a byte. If you would like to see how true that is, then change flush from Z_NO_FLUSH to Z_FULL_FLUSH and then feed it a byte at a time. Then indeed deflate will attempt to compress each byte of input separately.

In zlib programming, will the CHUNK size affect the compressed file size?

1 Answers