How can I correctly open/decode a jpeg image that has more than 65500 * 65500 pixels?

Question

I'm trying to extract image content from a file generated by Hamamatsu NanoZoomer slide scanner. The NDPI file uses a modified TIFF structure and stores image content in one big chunk in JPEG format. Using StripOffsets and StripByteCounts, I'm able to extract the data that's supposed be a JPEG file.

The data stream has all the correct signature for a JPEG file, such as FFD8, the start of scan marker and FFD9, the end of scan marker. If this is an image smaller than 65500*65500 pixels, then I can open the file just fine if I save the data stream into a jpeg file.

In a JFIF header, the third and fourth bytes after the FFC0 marker represent image height; the two bytes afterwards represent image width. However with an image that is larger than 65500*65500 pixels (which is actually 122880*78848 pixels), these four bytes that supposedly represent image height and image width are all zeros. I changed them to 255, 220, 255, 220, following this (line 255-263). When I checked the jpeg info by right clicking on it in Windows and chose details, I did see that Windows Photo Viewer read the resolution as 65500*65500, despite the fact that they do not represent the real pixel resolution. The problem is, when I tried to open the image, it's apparently decoded in a wrong way.

So my question is : how can I correctly open such a jpeg file ? Or say, how can I correctly decode the entirety of such image content into memory ?

I'm now trying to understand the file structure using MATLAB. Eventually I'll be using Python + OpenCV (or using Python + Cython + libjpeg-turbo if necessary ) to read the entire image into memory.

As I don't have your image to test with, it is very hard to say, but if you are dealing with large images definitely consider using vips. It excels at that... stackoverflow.com/a/36377369/2836621 — Mark Setchell
I believe this to be a libjpeg or libjpeg-turbo problem. VIPS is using either one of them. So merely switching to VIPS won't get around this problem. — user3667217
I've written my own imaging library (including custom JPEG codec) and this image could potentially be opened by my code. The extreme size presents a problem for opening it all at once. I could either open a scaled copy (1/8 x 1/8 = 14848 / 12800) or open a particular rectangular crop of it. The full res color uncompressed image would require 36GB of RAM. — BitBank
contact me directly to continue the conversation -> [email protected] — BitBank

Paradox Paradox · Accepted Answer · 2016-07-06T08:20:10

Without any more clues, just some remarks :

65500x65500 = 3GiO/channel (working)
122880*78848 = 9GiO/channel (objective)

These are already huge amount of contiguous memory ; especially on Windows, which has some limitations for this kind of applications (take a look at this for more info)

First, could you give any detail on your computer or software with which you are trying to open this image ? (amount of RAM, swap, max memory allocated to user space, etc)

Totally random guess, did you try with ImageJ ?

Would it be possible to just open on-the-fly the area you want to see (I am not sure you want to see the whole picture) ?

Why not using a multi-scale image representation ?

Edit : I just saw there were tools for your file format to be converted in TIFF done by the IN2P3. Which is also make me wonder if you really have a JPEG hidden in there or a TIFF.

How can I correctly open/decode a jpeg image that has more than 65500 * 65500 pixels?

2 Answers