0
votes

I am currently prototyping functionality in an app I wish to create, which requires the ability to read/modify/write metadata of a jpeg file.

One way I thought might work for this is to use the twelve-monkeys java API to read all segments in, modify those metadata segments I am interested in, and write all the segments (most of them being unmodified) back to a new file.

In my prototyping I came across a hurdle which triggered a question I, I have to admit, direct mainly at the author of the API. If you, not being the author, can also offer some insight, I certainly won't knock it back ;)

Here's an excerpt of my code:

    ImageInputStream streamMoto = ImageIO.createImageInputStream(file1);

    /*
        Prove, for testing/learning purposes, that this is a jpeg file
     */
    streamMoto.mark();
    final int foundFileType = streamMoto.readUnsignedShort();

    if (JPEG_FILE_ID==foundFileType) { // yes, continue... because we have a jpeg file
        streamMoto.reset(); // back to the very start

        // Here, the plan is to get ALL the JPEGSegments, so that we can
        // 1. iterate over each
        // 2. write non-relevant simply to new file (non-meta-data)
        // 3. note the meta-data segments and modify as (per application requirements) necessary (in my case, I want to edit description and tags/labels)
        // 4. after modifying meta-data segments, write the modified metadata segments, appending, to the new file
        // 5. At the end, we should have a new file, pretty much identical to the first (encoding and all) with ONLY meta-data modified
        streamMoto.mark();
        try {
            List<JPEGSegment> allSegments = JPEGSegmentUtil.readSegments(streamMoto, JPEGSegmentUtil.ALL_SEGMENTS);

            // Ahh, but wait! At this stage, in specifying 'ALL_SEGMENTS' I see a discrepancy in what I see in the resulting list of JPEGSegments (only 10 segments)
            // and my, very likely faulty, understanding of what I should see in this list. Namely, a whole bunch of following FF DA, FF C4 segments which I suppose are the 
            // main image data. 
            System.out.println("");

        } finally {
            streamMoto.reset();
        }
    }

xcvxvx

In testing the above code, I see 10 segments, listed as following:

FF E0  00 10   the JFIF marker, length 16 decimal
FF E1  07 A2   the Exif marker, length 1954 decimal
FF E1  OC 39   an adobe XMP section, length 3129 decimal
FF ED  00 82   some data in a Photoshop section, length 130 decimal
FF DB  00 43   unknown data; data available
FF DB  00 43   binary
FF C2  00 11   binary
FF C4  00 14   binary
FF C4  00 14   binary
FF DA  00 0c   binary

I've tried to do some research into the structure of a JPEG/Exif file, and been reading the specifications; for example, apparently the image data is the last thing in the file before the final 'FF D9' marker. How to read (or at least find) each marker (various tags such as FFE0, FFE1, FFC2 etc) which are followed by 4 bytes which indicate the length of that marker-segment. And in reading through the file like that, manually (using, say, Bless Hex editor) I read all the same segments as does this library .... until

After that last "FF DA" marker shown in the list above, I see (in the file) a whole bunch of other markers, which are, I assume (I need to do more research) the actual primary image data. It seems nonetheless to be broken up into distinct sections which one can sort of read through using the same rules as above (read marker, read length, skip length, find next marker) except that the next marker after the "FF DA" segments are not always directly after the specified length of that marker. In reading some advice from another stackoverflow article (sorry, i've lost it at the moment) one is told to simply skip forward until one finds the next 'FFxx' marker (in my case an FFC4) and continue on.

Now, my question was going to be, if these are markers or segments (I'm still confused by the terminology), why doesn't the library pick them up?

Well, I can probably guess at the answer;

They are not markers in the usual sense.... they are the primary image data... and that's why.

So my follow up questions would be;

  1. Can I then, at some point (what point) just consider the rest of the data one block, and since I don't need to modify it, write it out at the end of all the segments?

  2. How do I best read that end block of data (without having to reset the stream and read it the whole image again from the start ?

  3. I'm probably going about this all wrong, but I was looking for an efficient means of producing an image file with modified metadata and nothing else! I am aware that in reading in with the java libraries, the whole file, and then re-writing it, you are basically re-encoding, and change the whole file. You've, maybe only to a small degree but nonetheless, degraded the quality of the image file just to modify the metadata. Am I wrong? I don't want that! There must be a better way (without having to write my own image reader/writer from scratch)!

Thanks for any advice!

Sean

1
Okay, just a quick comment at this time: The JPEGSegmentUtil will stop scanning for segments after the first occurrence of either SOS, EOI or a second SOI marker (because, as you, I'm only interested in the metadata). However, as your JPEG files is progressive (indicated by the SOF2 or FFC2 marker), it has multiple SOS markers and image image data "segments" (follows directly after the SOS marker, with no marker or length). As you are only concerned with meta data, you can limit yourself to the APPn (FFEn) markers, and just write the rest of the file "as is".Harald K
Thanks again for your response. I'll look into that.svaens

1 Answers

0
votes

The problem you face is that there are several JPEG file formats in common use and they store metadata in different ways. The original JPEG standard did not define a file format. JFIF was created to fill the gaps. Adobe created their own file format. Then JPEG introduced the SPIFF format. Then came EXIF for cameras.