Decode JPEG image stripped from inside a PDFs file

Question

I have code that decompresses jpgs into bit maps which works fine for JPEG files, however when I feed the code a JPEG I have stripped directly from a PDFs XObject I get errors.

Adobe reader displays the image fine so I don't believe it's corrupted. I have read through JPEG and PDFs documentation and don't find any obvious problems.

My question is this, is there anything different in the "JPEG" embedded inside a PDFs stream and a normal JPEG? And if so what is it?

Note: I can manually open the PDFs, copy the image, paste into paint, and save...when I do this everything works....my problem is I need this automated.

When my code parses the PDFs, strips out the image stream, dumps the binary to a file, and then I try and open this file, it does not work. What am I missing?

My errors seem to be occurring in the Huffman decoding process, the cdt and Huffman tables appear to be read in fine.

I wrote code which can do the same thing. Can you post a sample image and I'll test it on my rig. — BitBank
Maybe you could use pdfimages... en.m.wikipedia.org/wiki/Pdfimages — Mark Setchell
I can't post an example image but the code I am using to decompress the images came from here: xbdev.net/image_formats/jpeg/jpeg_decoder_source/… — Joe
I can't post an example image but the code I am using to decompress the images came from here: xbdev.net/image_formats/jpeg/jpeg_decoder_source. There is a bug in this code inside the "BuildHuffmanTable" function but once I fixed that this code works as described earlier. The PDFs jpgs are causing me errors in the function " ProcessHuffmanDataUnit" — Joe
It's not useful to look at the source code you're using if we can't also see the file you're trying to decode. PDF files also support JPEG2000 streams, and that could be your problem. Show us a file and we'll give you an answer. — BitBank

user3344003 user3344003 · Accepted Answer · 2015-05-29T14:13:49

Pardon my using the answer section but I overflowed the comment section:

My questions: 1. What code is failing to decode the JPEG? You say you "have code" but where did that come from? Why do you think that it is reliable?

What is the file format of the JPEG stream? JFIF, ADOBE, EXIF, none specified?

Could there be something in the file format that your decoder cannot handle? Does your encoder check for different types of APPn markers?

What is the JPEG format? What type of SOS marker?

Does this encoder source handle all the normally formats? Baseline, Extended, Sequential, progressive? If you have progressive JPEG and and encoder that only does baseline, you are going to have a problem.

How many components does the JPEG stream have?

Some Adobe files have 4 components and decoders may only be able to handle 1 or 3.

Decode JPEG image stripped from inside a PDFs file

1 Answers