5
votes

I'm trying to extract image content from a file generated by Hamamatsu NanoZoomer slide scanner. The NDPI file uses a modified TIFF structure and stores image content in one big chunk in JPEG format. Using StripOffsets and StripByteCounts, I'm able to extract the data that's supposed be a JPEG file.

The data stream has all the correct signature for a JPEG file, such as FFD8, the start of scan marker and FFD9, the end of scan marker. If this is an image smaller than 65500*65500 pixels, then I can open the file just fine if I save the data stream into a jpeg file.

In a JFIF header, the third and fourth bytes after the FFC0 marker represent image height; the two bytes afterwards represent image width. However with an image that is larger than 65500*65500 pixels (which is actually 122880*78848 pixels), these four bytes that supposedly represent image height and image width are all zeros. I changed them to 255, 220, 255, 220, following this (line 255-263). When I checked the jpeg info by right clicking on it in Windows and chose details, I did see that Windows Photo Viewer read the resolution as 65500*65500, despite the fact that they do not represent the real pixel resolution. The problem is, when I tried to open the image, it's apparently decoded in a wrong way.

So my question is : how can I correctly open such a jpeg file ? Or say, how can I correctly decode the entirety of such image content into memory ?

I'm now trying to understand the file structure using MATLAB. Eventually I'll be using Python + OpenCV (or using Python + Cython + libjpeg-turbo if necessary ) to read the entire image into memory.

2
As I don't have your image to test with, it is very hard to say, but if you are dealing with large images definitely consider using vips. It excels at that... stackoverflow.com/a/36377369/2836621Mark Setchell
I believe this to be a libjpeg or libjpeg-turbo problem. VIPS is using either one of them. So merely switching to VIPS won't get around this problem.user3667217
I've written my own imaging library (including custom JPEG codec) and this image could potentially be opened by my code. The extreme size presents a problem for opening it all at once. I could either open a scaled copy (1/8 x 1/8 = 14848 / 12800) or open a particular rectangular crop of it. The full res color uncompressed image would require 36GB of RAM.BitBank
@BitBank Thanks for the comment, how can I try your code ?user3667217
contact me directly to continue the conversation -> [email protected]BitBank

2 Answers

0
votes

Without any more clues, just some remarks :

  • 65500x65500 = 3GiO/channel (working)
  • 122880*78848 = 9GiO/channel (objective)

These are already huge amount of contiguous memory ; especially on Windows, which has some limitations for this kind of applications (take a look at this for more info)

First, could you give any detail on your computer or software with which you are trying to open this image ? (amount of RAM, swap, max memory allocated to user space, etc)

Totally random guess, did you try with ImageJ ?

Would it be possible to just open on-the-fly the area you want to see (I am not sure you want to see the whole picture) ?

Why not using a multi-scale image representation ?

Edit : I just saw there were tools for your file format to be converted in TIFF done by the IN2P3. Which is also make me wonder if you really have a JPEG hidden in there or a TIFF.

0
votes

I would use openslide plus vips, it has fast and direct support for ndpi images. You can then copy the decoded image into matlab, or numpy, or just use vips for processing, depending what you need to do.

For example, I can write:

#!/usr/bin/python

import sys
import gi
gi.require_version('Vips', '8.0')
from gi.repository import Vips

im = Vips.Image.new_from_file(sys.argv[1])
im = im.crop(1000, 1000, 2000, 2000)
im.write_to_file(sys.argv[2])

Then run as:

$ time ./try228.py ~/Desktop/pics/2013_09_20_29.ndpi x.png
memory: high-water mark 15.24 MB
real    0m1.561s

That's for a 118784 x 102400 pixel image.

You could also use vips to convert the ndpi image to something simple like ppm. That should be trivial to load into memory.

$ vips copy ~/Desktop/pics/2013_09_20_29.ndpi huge.ppm

What kind of processing are you planning to do?

The openslide web site has a nice overview of the ndpi file format, if you are curious.

There's a 64-bit Windows binary for vips here. Just unzip that and run vips.exe.

The vips GUI, nip2, will have no problems processing your image. There's a windows installer. Start the program and click File / Open, or drag in the .ndpi image from Explorer. Double-click on the thumbnail in the main window to open a view window. Use the Toolkits menu to process the image. Press F1 for help.