I'm trying to extract image content from a file generated by Hamamatsu NanoZoomer slide scanner. The NDPI file uses a modified TIFF structure and stores image content in one big chunk in JPEG format. Using StripOffsets and StripByteCounts, I'm able to extract the data that's supposed be a JPEG file.
The data stream has all the correct signature for a JPEG file, such as FFD8, the start of scan marker and FFD9, the end of scan marker. If this is an image smaller than 65500*65500 pixels, then I can open the file just fine if I save the data stream into a jpeg file.
In a JFIF header, the third and fourth bytes after the FFC0 marker represent image height; the two bytes afterwards represent image width. However with an image that is larger than 65500*65500 pixels (which is actually 122880*78848 pixels), these four bytes that supposedly represent image height and image width are all zeros. I changed them to 255, 220, 255, 220, following this (line 255-263). When I checked the jpeg info by right clicking on it in Windows and chose details, I did see that Windows Photo Viewer read the resolution as 65500*65500, despite the fact that they do not represent the real pixel resolution. The problem is, when I tried to open the image, it's apparently decoded in a wrong way.
So my question is : how can I correctly open such a jpeg file ? Or say, how can I correctly decode the entirety of such image content into memory ?
I'm now trying to understand the file structure using MATLAB. Eventually I'll be using Python + OpenCV (or using Python + Cython + libjpeg-turbo if necessary ) to read the entire image into memory.
vips
. It excels at that... stackoverflow.com/a/36377369/2836621 – Mark Setchell