I'm trying to speed up my TIFF encoder initially written in Java by switching to C and have compiled Zlib 1.2.8 with Z_SOLO
defined and minimum set of C files: adler32.c
, crc32.c
, deflate.c
, trees.c
and zutil.c
. Java is using java.util.zip.Deflater
.
I wrote a simple test program to assess performance in terms of compression level and speed and was puzzled by the fact that whatever level I require, compression does not kick in that much, considering the increasing amount of time required by higher levels. I was also amazed by Java actually performing better than a Visual Studio Release-compile (VC2010) in both compression and speed:
Java:
Level 1 : 8424865 => 6215200 (73,8%) in 247 cycles.
Level 2 : 8424865 => 6178098 (73,3%) in 254 cycles.
Level 3 : 8424865 => 6181716 (73,4%) in 269 cycles.
Level 4 : 8424865 => 6337236 (75,2%) in 334 cycles.
Level 5 : 8424865 => 6331902 (75,2%) in 376 cycles.
Level 6 : 8424865 => 6333914 (75,2%) in 395 cycles.
Level 7 : 8424865 => 6333350 (75,2%) in 400 cycles.
Level 8 : 8424865 => 6331986 (75,2%) in 437 cycles.
Level 9 : 8424865 => 6331598 (75,2%) in 533 cycles.
C:
Level 1 : 8424865 => 6215586 (73.8%) in 298 cycles.
Level 2 : 8424865 => 6195280 (73.5%) in 309 cycles.
Level 3 : 8424865 => 6182748 (73.4%) in 331 cycles.
Level 4 : 8424865 => 6337942 (75.2%) in 406 cycles.
Level 5 : 8424865 => 6339203 (75.2%) in 457 cycles.
Level 6 : 8424865 => 6337100 (75.2%) in 481 cycles.
Level 7 : 8424865 => 6336396 (75.2%) in 492 cycles.
Level 8 : 8424865 => 6334293 (75.2%) in 547 cycles.
Level 9 : 8424865 => 6333084 (75.2%) in 688 cycles.
Am I the only one witnessing such results? My guess is that the Zlib in the JVM is using assembly-type optimizations that I'm not including in my C project, or I am missing an obvious configuration step when compiling Zlib (or Visual Studio compiler sucks).
Here are both snippets:
Java:
public static void main(String[] args) throws IOException {
byte[] pix = Files.readAllBytes(Paths.get("MY_MOSTLY_UNCOMPRESSED.TIFF"));
int szin = pix.length;
byte[] buf = new byte[szin*101/100];
int szout;
long t0, t1;
for (int i = 1; i <= 9; i++) {
t0 = System.currentTimeMillis();
Deflater deflater = new Deflater(i);
deflater.setInput(pix);
szout = deflater.deflate(buf);
deflater.finish();
t1 = System.currentTimeMillis();
System.out.println(String.format("Level %d : %d => %d (%.1f%%) in %d cycles.", i, szin, szout, 100.0f*szout/szin, t1 - t0));
}
}
C:
#include <time.h>
#define SZIN 9000000
#define SZOUT 10000000
void main(void)
{
static unsigned char buf[SZIN];
static unsigned char out[SZOUT];
clock_t t0, t1;
int i, ret;
uLongf sz, szin;
FILE* f = fopen("MY_MOSTLY_UNCOMPRESSED.TIFF", "rb");
szin = fread(buf, 1, SZIN, f);
fclose(f);
for (i = 1; i <= 9; i++) {
sz = SZOUT;
t0 = clock();
compress2(out, &sz, buf, szin, i); // I rewrote compress2, as it's not available when Z_SOLO is defined
t1 = clock();
printf("Level %d : %d => %d (%.1f%%) in %ld cycles.\n", i, szin, sz, 100.0f*sz/szin, t1 - t0);
}
}
EDIT:
After @MarkAdler's remarks, I tried different compressing strategies through deflateInit2()
(namely Z_FILTERED
and Z_HUFFMAN_ONLY
):
Z_FILTERED
:
Level 1 : 8424865 => 6215586 (73.8%) in 299 cycles.
Level 2 : 8424865 => 6195280 (73.5%) in 310 cycles.
Level 3 : 8424865 => 6182748 (73.4%) in 330 cycles.
Level 4 : 8424865 => 6623409 (78.6%) in 471 cycles.
Level 5 : 8424865 => 6604616 (78.4%) in 501 cycles.
Level 6 : 8424865 => 6595698 (78.3%) in 528 cycles.
Level 7 : 8424865 => 6594845 (78.3%) in 536 cycles.
Level 8 : 8424865 => 6592863 (78.3%) in 595 cycles.
Level 9 : 8424865 => 6591118 (78.2%) in 741 cycles.
Z_HUFFMAN_ONLY
:
Level 1 : 8424865 => 6803043 (80.7%) in 111 cycles.
Level 2 : 8424865 => 6803043 (80.7%) in 108 cycles.
Level 3 : 8424865 => 6803043 (80.7%) in 106 cycles.
Level 4 : 8424865 => 6803043 (80.7%) in 106 cycles.
Level 5 : 8424865 => 6803043 (80.7%) in 107 cycles.
Level 6 : 8424865 => 6803043 (80.7%) in 106 cycles.
Level 7 : 8424865 => 6803043 (80.7%) in 107 cycles.
Level 8 : 8424865 => 6803043 (80.7%) in 108 cycles.
Level 9 : 8424865 => 6803043 (80.7%) in 107 cycles.
As expected per his comment, Z_HUFFMAN_ONLY
does not change compression but performs a lot faster. With my data, Z_FILTERED
was not faster and a little worse compression than the Z_DEFAULT_STRATEGY
.
3
is the smallest. Are you sure there is not something odd about your data? – Peter Lawreyfis.read(pix)
may not read the entire file, in which case the remainder ofpix
would be zeroes. I suggest replacing the use of FileInputStream withpix = Files.readAllBytes(Paths.get("MY_MOSTLY_UNCOMPRESSED.TIFF"))
. – VGR