5
votes

I am attempting to merge a number of input TIFF files into a single multi-page output TIFF file using the open JDK 11 imageIO ImageReader and ImageWriter classes. My routine works correctly for almost all of my sample input files created from a number of different brands of scanning devices. These devices produce a variety of TIFF files using old and new JPEG compression. However, the TIFF files from one particular device result in an incorrect output that has a pink background. Even stranger, the TIFF produced with a portrait scan creates the correct output, while the TIFF produced with a landscape scan from the same device produces the incorrect output with a pink background. I can see no obvious difference between the 2 input files that would cause the difference in behaviour when processed by the ImageIO library.

I know that a pink background in the output usually indicates that there is a problem with transparency interpretation. I found a number of references to this issue when reading and writing JEPG images. However, I have not found any references to similar issues with TIFF images. When I walk through the ImageReader and ImageWriter in the debugger I can find no obvious difference between the input TIFF file that works and the file that produces the bad pink coloured output. Neither file has transparency. Both have the same YCbCr photometric interpretation, bands, and sub-sampling. The problematic TIFF file uses old JPEG compression, so the image write parameters explicitly specify new JPEG compression for ImageWriter. However, this is true for the similar portrait TIFF file that works correctly, so the problem must be more subtle than just the output compression.

Below is a simple command-line application that reproduces my issue.

package com.example;

import javax.imageio.IIOImage;
import javax.imageio.ImageIO;
import javax.imageio.ImageReader;
import javax.imageio.ImageWriteParam;
import javax.imageio.ImageWriter;
import javax.imageio.metadata.IIOMetadata;
import javax.imageio.metadata.IIOMetadataFormatImpl;
import javax.imageio.metadata.IIOMetadataNode;
import javax.imageio.stream.ImageInputStream;
import javax.imageio.stream.ImageOutputStream;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.util.Iterator;

public class Main {

    private static final String TIFF_FORMAT = "tiff";
    private static final String IMAGEIO_PLUGIN_PACKAGE = "com.sun.imageio.plugins.tiff";
    //private static final String IMAGEIO_PLUGIN_PACKAGE = "com.github.jaiimageio.impl.plugins.tiff";

    public static void main(String[] args) {
        if (args.length != 2) {
            System.out.println("You must specify an input directory and output filename");
            return;
        }

        File sourceDirectory = new File(args[0]);
        if (!sourceDirectory.exists() || !sourceDirectory.isDirectory()) {
            System.out.println(String.format("Source directory '%s' is invalid", args[0]));
        }
        File outputFile = new File(args[1]);
        if (outputFile.exists()) {
            outputFile.delete();
        }
        File inputFiles[] = sourceDirectory.listFiles();

        mergeTiffFiles(inputFiles, outputFile);
    }

    /**
     * Merge a list of TIFF files into a single output TIFF file using the Java ImageIO utilities.
     *
     * @param inputFilePaths list of input file paths to merge
     * @param mergedFilePath destination path for the merged output file
     */
    private static void mergeTiffFiles(
            final File[] inputFilePaths,
            final File mergedFilePath) {
        ImageReader reader = null;
        ImageWriter writer = null;
        File inputFilePath = null;
        try (
                OutputStream outputStream = new FileOutputStream(mergedFilePath);
                ImageOutputStream ios = ImageIO.createImageOutputStream(outputStream)
        ) {
            // Initialise the output writer
            writer = getTiffWriter();
            writer.setOutput(ios);
            writer.prepareWriteSequence(null);

            // Iterate through the source files appending the pages in order within and across files
            reader = getTiffReader();
            for (final File filePath : inputFilePaths) {
                inputFilePath = filePath;
                try (
                        FileInputStream inputFile = new FileInputStream(filePath);
                        ImageInputStream inputStream = ImageIO.createImageInputStream(inputFile)
                ) {
                    reader.setInput(inputStream);
                    int numImages = reader.getNumImages(true);
                    for (int j = 0; j < numImages; j++) {
                        IIOMetadata imageMetadata = reader.getImageMetadata(j); // 0, first image
                        ImageWriteParam writeParams = getTiffWriteParams(writer, imageMetadata);
                        BufferedImage image = reader.read(j);
                        writer.writeToSequence(new IIOImage(image, null, imageMetadata), writeParams);
                    }
                }
            }
            inputFilePath = null;

            // Finalize the output file
            writer.endWriteSequence();
        } catch (Exception e) {
            if (inputFilePath != null) {
                throw new IllegalStateException(String.format("Error while merging TIFF file: %s", inputFilePath), e);
            } else {
                throw new IllegalStateException("Failed to merge TIFFs files", e);
            }
        } finally {
            // Cleanup the reader and writer
            if (writer != null) {
                writer.dispose();
            }
            if (reader != null) {
                reader.dispose();
            }
        }
    }

    /**
     * Get an TIFF reader used to read the source pages - ensure we use the imageIO plugin.
     *
     * @return an TIFF image reader.
     * @throws IOException if an reader plugin cannot be found
     */
    private static ImageReader getTiffReader() throws IOException {
        ImageReader reader = null;
        Iterator readers = ImageIO.getImageReadersByFormatName(TIFF_FORMAT);
        if (readers.hasNext()) {
            do {
                reader = (ImageReader) readers.next();
            } while (!reader.getClass().getPackage().getName().equals(IMAGEIO_PLUGIN_PACKAGE) && readers.hasNext());
        }
        if (reader == null) {
            throw new IOException("No imageio readers for format: " + TIFF_FORMAT);
        }
        return reader;
    }

    /**
     * Get a TIFF writer used to create the merged page - ensure we use the imageIO plugin
     *
     * @return a TIFF image writer
     * @throws IOException if an writer plugin cannot be found
     */
    private static ImageWriter getTiffWriter() throws IOException {
        ImageWriter writer = null;
        Iterator writers = ImageIO.getImageWritersByFormatName(TIFF_FORMAT);
        if (writers.hasNext()) {
            do {
                writer = (ImageWriter) writers.next();
            } while (!writer.getClass().getPackage().getName().equals(IMAGEIO_PLUGIN_PACKAGE) && writers.hasNext());
        }
        if (writer == null) {
            throw new IOException("No imageio writers for format: " + TIFF_FORMAT);
        }
        return writer;
    }

    /**
     * Get the appropriate TIFF write parameters to apply for an input with the given image meta-data.
     * Check the source image compression. If possible use the same compression settings as those from the
     * input image.  However, the ImageIO library doesn't support the legacy JPEG compression format for TIFF
     * images.  Unfortunately, there are a number of devices that create scanned TIFF images of this type
     * (Xerox, HP OXP).  To support the merge operation explicitly force the new JPEG compression with a high
     * quality value.
     *
     * @param writer        TIFF image writer that will use the returned image parameters
     * @param imageMetadata meta-data associated with the image to write
     * @return the adjusted image write parameters
     */
    private static ImageWriteParam getTiffWriteParams(ImageWriter writer, IIOMetadata imageMetadata) {
        // Determine the source compression type
        IIOMetadataNode root =
                (IIOMetadataNode) imageMetadata.getAsTree(IIOMetadataFormatImpl.standardMetadataFormatName);
        IIOMetadataNode compression =
                (IIOMetadataNode) root.getElementsByTagName("CompressionTypeName").item(0);
        String compressionName = compression.getAttribute("value");
        ImageWriteParam writeParams = writer.getDefaultWriteParam();
        if (compressionName.equalsIgnoreCase("Old JPEG")) {
            // Convert to modern JPEG encoding if the source uses old JPEG compression.
            writeParams.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
            writeParams.setCompressionType("JPEG");
            double quality = 0.95;
            quality = Math.max(0, Math.min(1, quality));
            writeParams.setCompressionQuality((float) quality);
        } else {
            // Otherwise use the source image compression if possible
            writeParams.setCompressionMode(ImageWriteParam.MODE_COPY_FROM_METADATA);
        }
        writeParams.setTilingMode(ImageWriteParam.MODE_COPY_FROM_METADATA);
        return writeParams;
    }
}

I expect the output of the similar landscape and portrait TIFFs to have the correct white background. I am obviously doing something wrong with the setup of the reading or writing procedure. However, there are not many options to try. The ImageReader only supports one image destination type for the TIFF files. The problem happens with the latest open JDK 11.0.4_11 version.

1
Can you also share one of the files that causes the incorrect color? I've skimmed through your code and it seems sound. But having the correct input file, would allow further diagnosis. Most likely, there's an inconsistency between the photometric interpretation tag and the JPEG stream, but I can't say for sure. Also, what happens if the compression fallback for "Old JPEG" is changed to "None" (or any other lossless compression)? If the colors are still wrong, that would most likely rule out an error in the writing...Harald K
Thank you for taking the time to consider my problem haraldK. Here is a drop-box link to a folder containing 2 input TIFF files that demonstrate the difference between files produced by landscape and portrait scans. I have also included the output files from the program I provided above for reference: dropbox.com/sh/x9x23q39vpexg3g/AAAEqTBZoe6we4Sr3O2WO_dLa?dl=0Lawrence Croft
I tried your suggestion to disable encoding and even though the file size increased form about 700K to 25M the problem remains. This does indicate a problem with the ImageReader rather than the ImageWriter.Lawrence Croft
The landscape and portrait TIFFs have virtually the same tags except the width and height which are reversed due to the orientation and expected timestamp differences. This seems to indicate that the problem is somewhere in the JPEG description not the tags such as the photometric interpretation.Lawrence Croft
I did some more tests on other types of devices from the same manufacturer yesterday and the results are consistent. All color portrait scans process correctly. All color landscape scans produce the pink background. Very puzzling.Lawrence Croft

1 Answers

3
votes

Okay, by inspecting the sample files, I think I have found the problem. And it's not in your code*.

When reading and writing TIFF with JPEG compression, the TIFF plugin will delegate decoding/encoding of the embedded JPEG stream to the JPEG plugin. In theory, this is simple, because the JPEG contains no color information, and the TIFF container contains the correct color information in the 262/PhotometricInterpretation tag.

In real life, this is much more complex, because sometimes the TIFF tag is missing or incorrect (especially in combination with 259/Compression tag with value 6 ("Old JPEG"). Or the JPEG encoder/decoder will make its own assumptions about color space (based on conventions for standalone JPEGs, typically JFIF or Exif), which is what I believe is the case here. The JPEG plugin bundled with the JRE uses the conventions documented here, and color space is inferred from the component ids in the SOFn marker.

For your files, we can see that the component ids differ.

Portrait file:

SOF0[ffc0, precision: 8, lines: 3520, samples/line: 2496, 
     components: [id: 1, sub: 1/1, sel: 0, id: 2, sub: 1/1, sel: 1, id: 3, sub: 1/1, sel: 1]]

Landscape file:

SOF0[ffc0, precision: 8, lines: 2496, samples/line: 3520, 
    components: [id: 0, sub: 1/1, sel: 0, id: 1, sub: 1/1, sel: 1, id: 2, sub: 1/1, sel: 1]]

The component ids in the portrait file are the normal 1, 2, and 3, while the landscape has ids 0, 1, and 2. Both files has no subsampling (ie. 1:1).

From the conventions:

If these values are 1-3 for a 3-channel image, then the image is assumed to be YCbCr [...]

Otherwise, 3-channel subsampled images are assumed to be YCbCr, 3-channel non-subsampled images are assumed to be RGB.

Because of this, the landscape image will be treated as already in RGB (and, incorrectly, not converted from YCbCr), resulting in the pinkish tint. Even though everything else in the TIFF container clearly indicates that it's YCbCr.

To fix this issue (and many other issues), I have created my own JPEG plugin that can be used as a drop-in replacement for the JRE plugin. It follows (the much simpler) conventions found in IJG's libJPEG, resulting in better color space consistency with other applications. In combination with the TIFF plugin from the same project, both your inputs are read correctly (white background). I have not tested it with the JRE TIFF plugin, but in theory, it should/could also work. Unfortunately, the TwelveMonkeys TIFF plugin does not (yet) have the write capabilities you use (tiling) and has some limitations about what meta data it writes.


PS: As you seem to deal mainly with JPEGs that degrade in quality when re-encoding, you might want to look at merging TIFFs without decoding the image data. You can find an example of that in TIFFUtilities, written by Oliver Schmidtmer.

*) It is technically possible to work around the problem in your code, but it's kind of complex to handle all the cases correctly. If you want to implement this yourself, or are just curious I suggest you have a look at the source code for the TwelveMonkeys ImageIO JPEG plugin.