Benchmarking Zlib Java vs. C

Question

I'm trying to speed up my TIFF encoder initially written in Java by switching to C and have compiled Zlib 1.2.8 with Z_SOLO defined and minimum set of C files: adler32.c, crc32.c, deflate.c, trees.c and zutil.c. Java is using java.util.zip.Deflater.

I wrote a simple test program to assess performance in terms of compression level and speed and was puzzled by the fact that whatever level I require, compression does not kick in that much, considering the increasing amount of time required by higher levels. I was also amazed by Java actually performing better than a Visual Studio Release-compile (VC2010) in both compression and speed:

Java:

Level 1 : 8424865 => 6215200 (73,8%) in 247 cycles.
Level 2 : 8424865 => 6178098 (73,3%) in 254 cycles.
Level 3 : 8424865 => 6181716 (73,4%) in 269 cycles.
Level 4 : 8424865 => 6337236 (75,2%) in 334 cycles.
Level 5 : 8424865 => 6331902 (75,2%) in 376 cycles.
Level 6 : 8424865 => 6333914 (75,2%) in 395 cycles.
Level 7 : 8424865 => 6333350 (75,2%) in 400 cycles.
Level 8 : 8424865 => 6331986 (75,2%) in 437 cycles.
Level 9 : 8424865 => 6331598 (75,2%) in 533 cycles.

C:

Level 1 : 8424865 => 6215586 (73.8%) in 298 cycles.
Level 2 : 8424865 => 6195280 (73.5%) in 309 cycles.
Level 3 : 8424865 => 6182748 (73.4%) in 331 cycles.
Level 4 : 8424865 => 6337942 (75.2%) in 406 cycles.
Level 5 : 8424865 => 6339203 (75.2%) in 457 cycles.
Level 6 : 8424865 => 6337100 (75.2%) in 481 cycles.
Level 7 : 8424865 => 6336396 (75.2%) in 492 cycles.
Level 8 : 8424865 => 6334293 (75.2%) in 547 cycles.
Level 9 : 8424865 => 6333084 (75.2%) in 688 cycles.

Am I the only one witnessing such results? My guess is that the Zlib in the JVM is using assembly-type optimizations that I'm not including in my C project, or I am missing an obvious configuration step when compiling Zlib (or Visual Studio compiler sucks).

Here are both snippets:

Java:

public static void main(String[] args) throws IOException {
    byte[] pix = Files.readAllBytes(Paths.get("MY_MOSTLY_UNCOMPRESSED.TIFF"));
    int szin = pix.length;
    byte[] buf = new byte[szin*101/100];
    int szout;
    long t0, t1;

    for (int i = 1; i <= 9; i++) {
        t0 = System.currentTimeMillis();
        Deflater deflater = new Deflater(i);
        deflater.setInput(pix);
        szout = deflater.deflate(buf);
        deflater.finish();
        t1 = System.currentTimeMillis();
        System.out.println(String.format("Level %d : %d => %d (%.1f%%) in %d cycles.", i, szin, szout, 100.0f*szout/szin, t1 - t0));
    }
}

C:

#include <time.h>
#define SZIN 9000000
#define SZOUT 10000000
void main(void)
{
    static unsigned char buf[SZIN];
    static unsigned char out[SZOUT];
    clock_t t0, t1;
    int i, ret;
    uLongf sz, szin;
    FILE* f = fopen("MY_MOSTLY_UNCOMPRESSED.TIFF", "rb");
    szin = fread(buf, 1, SZIN, f);
    fclose(f);

    for (i = 1; i <= 9; i++) {
        sz = SZOUT;
        t0 = clock();
        compress2(out, &sz, buf, szin, i); // I rewrote compress2, as it's not available when Z_SOLO is defined
        t1 = clock();
        printf("Level %d : %d => %d (%.1f%%) in %ld cycles.\n", i, szin, sz, 100.0f*sz/szin, t1 - t0);
    }
}

EDIT:

After @MarkAdler's remarks, I tried different compressing strategies through deflateInit2() (namely Z_FILTERED and Z_HUFFMAN_ONLY):

Z_FILTERED:

Level 1 : 8424865 => 6215586 (73.8%) in 299 cycles.
Level 2 : 8424865 => 6195280 (73.5%) in 310 cycles.
Level 3 : 8424865 => 6182748 (73.4%) in 330 cycles.
Level 4 : 8424865 => 6623409 (78.6%) in 471 cycles.
Level 5 : 8424865 => 6604616 (78.4%) in 501 cycles.
Level 6 : 8424865 => 6595698 (78.3%) in 528 cycles.
Level 7 : 8424865 => 6594845 (78.3%) in 536 cycles.
Level 8 : 8424865 => 6592863 (78.3%) in 595 cycles.
Level 9 : 8424865 => 6591118 (78.2%) in 741 cycles.

Z_HUFFMAN_ONLY:

Level 1 : 8424865 => 6803043 (80.7%) in 111 cycles.
Level 2 : 8424865 => 6803043 (80.7%) in 108 cycles.
Level 3 : 8424865 => 6803043 (80.7%) in 106 cycles.
Level 4 : 8424865 => 6803043 (80.7%) in 106 cycles.
Level 5 : 8424865 => 6803043 (80.7%) in 107 cycles.
Level 6 : 8424865 => 6803043 (80.7%) in 106 cycles.
Level 7 : 8424865 => 6803043 (80.7%) in 107 cycles.
Level 8 : 8424865 => 6803043 (80.7%) in 108 cycles.
Level 9 : 8424865 => 6803043 (80.7%) in 107 cycles.

As expected per his comment, Z_HUFFMAN_ONLY does not change compression but performs a lot faster. With my data, Z_FILTERED was not faster and a little worse compression than the Z_DEFAULT_STRATEGY.

I am surprised that level 3 is the smallest. Are you sure there is not something odd about your data? — Peter Lawrey
@PeterLawrey it's a "standard" TIFF file size 2800x2900, containing 2 pages, the first one being uncompressed, the second one being deflate-compressed. I could understand it as "compressing compressed data inflates them". I could try compressing the already-compressed data though, to see what's happening (if I have some time this week-end). — Matthieu
In the Java program, be aware that fis.read(pix) may not read the entire file, in which case the remainder of pix would be zeroes. I suggest replacing the use of FileInputStream with pix = Files.readAllBytes(Paths.get("MY_MOSTLY_UNCOMPRESSED.TIFF")). — VGR
The amount of compression you get depends on the data. Most image files are already compressed to a significant degree, so you're not likely to see much additional compression. — Hot Licks
Not surprising. Each "level" is a different algorithm, and some work better on the given data (or in this case, more likely have less overhead) than others. — Hot Licks

Mark Adler Mark Adler · Accepted Answer · 2014-07-25T14:32:10

The compression amounts and deltas are not surprising for uncompressed image data with essentially no matching strings. The portion of the data that is compressed is not being compressed further -- it is being expanded slightly by some constant amount, so the variation is all on the uncompressed portion.

There is a change in algorithm between level 3 and 4, where level 3 goes for the first match it finds. When there are hardly no matching strings, that will tend to minimize the overhead of sending string matches, hence the better compression. You might do better still if string matching were turned off completely using FILTERED or HUFFMAN_ONLY. HUFFMAN_ONLY also has the advantage of not even looking for matching strings, speeding up the compression significantly.

As for the speed difference, I can only guess that different compilers or different compiler optimizations were used.

Benchmarking Zlib Java vs. C

1 Answers