Extract bitmap images from PDF using itextsharp in C#

Question

I am using following code to convert PDF to image using itext-sharp.

private static System.Drawing.Image ExtractImages(String PDFSourcePath)
{
    iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = null;
    iTextSharp.text.pdf.PdfReader PDFReaderObj = null;
    iTextSharp.text.pdf.PdfObject PDFObj = null;
    iTextSharp.text.pdf.PdfStream PDFStremObj = null;

    try
    {
        RAFObj = new iTextSharp.text.pdf.RandomAccessFileOrArray(PDFSourcePath);
        PDFReaderObj = new iTextSharp.text.pdf.PdfReader(RAFObj, null);

        for (int i = 0; i <= PDFReaderObj.XrefSize - 1; i++)
        {
            PDFObj = PDFReaderObj.GetPdfObject(i);

            if ((PDFObj != null) && PDFObj.IsStream())
            {
                PDFStremObj = (iTextSharp.text.pdf.PdfStream)PDFObj;
                iTextSharp.text.pdf.PdfObject subtype = PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);

                if ((subtype != null) && subtype.ToString() == iTextSharp.text.pdf.PdfName.IMAGE.ToString())
                {
                    byte[] bytes = iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw((iTextSharp.text.pdf.PRStream)PDFStremObj);

                    if ((bytes != null))
                    {
                        try
                        {
                            System.IO.MemoryStream MS = new System.IO.MemoryStream(bytes);
                            Bitmap ImgPDF = new Bitmap(MS);
                            return ImgPDF;
                        }
                        catch (Exception)
                        {

                        }

                    }
                }
            }
        }

        RAFObj.Close();
        PDFReaderObj.Close();
        return null;
    }
    catch (Exception ex)
    {
        throw new Exception(ex.Message);
    }

}

It works for some pdf files but for some files it throws exception at

Bitmap ImgPDF = new Bitmap(MS);

Parameter invalid

i am really confused. why this happens. Is it due to security difference of files or some other reason? Help me to resolve this.

you can use Apitron.PDF Rasterizer for document to image conversion — stanlyF

The Blue Shirt Developer The Blue Shirt Developer · Accepted Answer · 2015-09-07T05:52:35

You need to check the stream's /Filter to see what image format a given image uses. It may be a standard image format:

DCTDecode (jpeg)
JPXDecode (jpeg 2000)
JBIG2Decode (jbig is a B&W only format)
CCITTFaxDecode (fax format, PDF supports group 3 and 4)

Other than that, you'll need to get the raw bytes (as you are), and build an image using the image stream's width, height, bits per component, number of color components (could be CMYK, indexed, RGB, or Something Weird), and a few others, as defined in section 8.9 of the ISO PDF SPECIFICATION (available for free).

So in some cases your code will work, but in others, it'll fail with the exception you mentioned. Source

Extract bitmap images from PDF using itextsharp in C#

2 Answers