7
votes

I generated some high resolution publication quality plots for example

library(plot3D)
Volcano<-volcano
zf=10 #zoom factor
tiff("Volcano.tif", width=1800*zf, height=900*zf, res=175*zf, compression="lzw")
image2D(z = Volcano, clab = "height, m",colkey = list(dist = -0.20, shift = 0.15,side = 3, length = 0.5, width = 0.5,cex.clab = 1.2, col.clab = "white", line.clab = 2,col.axis = "white", col.ticks = "white", cex.axis = 0.8))
dev.off()

the file is 22 MB.

Now I open the file with GIMP and without doing anything else I export it as "Volcano gimp.tif" (don't change resolution, or do anything else). GIMP generates a file ("Volcano gimp.tif") that is 1.9 MB.

imagemagick reports similar image stats:

$ identify Volcano.tif
Volcano.tif TIFF 18000x9000 18000x9000+0+0 8-bit DirectClass 22.37MB 0.000u 0:00.000
$ identify "Volcano gimp.tif"
Volcano gimp.tif TIFF 18000x9000 18000x9000+0+0 8-bit DirectClass 1.89MB 0.000u 0:00.000

even using identify -verbose the 2 files appear to be similar.

What is the difference between these files? Why do they have so different file sizes?

UPDATE: OK, things are getting crazier. I did the same thing with IrfanView and I get different file sizes. The initial file is the Volcano.tif generated from R with compression="lzw". Check how Volcano irfan.tif and Volcano gimp.tif differ in size but all other stats are the same. Memory footprint, DPI, Colors, Resolution is identical. Disk size is different.

enter image description here

UPDATE 2: Adobe Photoshop saves the file down to 2.6 MB

enter image description here

WinRar reports that the original R generated TIFF is highly compressible (from 22MB ->3.6MB)

UPDATE 3: This issue might be similar to Montage / Join 2 TIFF images in a 2 col x 1 row tile without losing quality

UPDATE 4: The R generated TIFF file can be found here http://ge.tt/7ZvRd4C1/v/0?c

1
There seems to be something amiss with the tiff function. On my Win7 machine, (a slightly out of date v2.15.2) R won't create a valid image file at all using compression rle, jpeg or zip. Will investigate further on a different machine later. In the mean time, try playing around with tiff options and see if you can replicate my odd behaviour. There could be a bug buried here. - Richie Cotton
compression="zip" crushes my session! - ECII
The R generated TIFF file is not using the TIFF predictor. This causes the terrible compression when working with 24-bpp data since the LZW compression works 8-bits at a time. The predictor allows for the constant color sections to "cancel each other out", become black and compress much better. - BitBank
OK, thanks for the info. What does this mean practically? Is the problem solely on the compression? Should I output as uncompressed and then compress with GIMP? Also please make this an answer rather a comment (would be helpful to include some more details, I am considering filing this as a bug). - ECII
In the future, you can use my TIFFTOOL to see all the details of why those files were different: bitbanksoftware.com/tinytools.html - BitBank

1 Answers

9
votes

Apparently the TIFF LZW compressor used by R is not making use of an important option (the TIFF predictor) which is leading to an extremely large file. Data compression works best when it can recognize symmetries/redundancies in the data. In this case, the image data is composed of 24-bit (3-byte) pixels containing red, green and blue 8-bit values. Standard LZW compression looks at a stream of bytes for repeating patterns. If it looks at the color image simply as a stream of bytes, it will see repeating patterns of 3-bytes instead of repeating patterns of constant color. Enabling the TIFF predictor on the data causes a differencing filter to store the delta of each pixel with its neighbor. If the neighboring pixels are the same color, it will store 0's. A long string of 0's compresses much better than repeating patterns of non-zeros which are at least 3 bytes long.

Here is an example of how it works on a 6 pixel line. When encoding, the predictor starts from the right edge and works left for each scan line:

Original data:
2A 50 40 2A 50 40 2A 50 40 2A 50 40 2A 50 40 2A 50 40 (6 pixels of the same color)

After horizontal differencing (TIFF predictor):
2A 50 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

The data is much more compressible after the predictor since long runs of the same value (0x00) are easier for LZW to compress.

Conclusion: This should be filed as a bug against the owner of the R compression code since using LZW on full color images without the predictor produces poor results. In the mean time, a workaround is needed to compress it more efficiently.