0
votes

I am using following code to convert PDF to image using itext-sharp.

private static System.Drawing.Image ExtractImages(String PDFSourcePath)
{
    iTextSharp.text.pdf.RandomAccessFileOrArray RAFObj = null;
    iTextSharp.text.pdf.PdfReader PDFReaderObj = null;
    iTextSharp.text.pdf.PdfObject PDFObj = null;
    iTextSharp.text.pdf.PdfStream PDFStremObj = null;

    try
    {
        RAFObj = new iTextSharp.text.pdf.RandomAccessFileOrArray(PDFSourcePath);
        PDFReaderObj = new iTextSharp.text.pdf.PdfReader(RAFObj, null);

        for (int i = 0; i <= PDFReaderObj.XrefSize - 1; i++)
        {
            PDFObj = PDFReaderObj.GetPdfObject(i);

            if ((PDFObj != null) && PDFObj.IsStream())
            {
                PDFStremObj = (iTextSharp.text.pdf.PdfStream)PDFObj;
                iTextSharp.text.pdf.PdfObject subtype = PDFStremObj.Get(iTextSharp.text.pdf.PdfName.SUBTYPE);

                if ((subtype != null) && subtype.ToString() == iTextSharp.text.pdf.PdfName.IMAGE.ToString())
                {
                    byte[] bytes = iTextSharp.text.pdf.PdfReader.GetStreamBytesRaw((iTextSharp.text.pdf.PRStream)PDFStremObj);

                    if ((bytes != null))
                    {
                        try
                        {
                            System.IO.MemoryStream MS = new System.IO.MemoryStream(bytes);
                            Bitmap ImgPDF = new Bitmap(MS);
                            return ImgPDF;
                        }
                        catch (Exception)
                        {

                        }

                    }
                }
            }
        }

        RAFObj.Close();
        PDFReaderObj.Close();
        return null;
    }
    catch (Exception ex)
    {
        throw new Exception(ex.Message);
    }

}

It works for some pdf files but for some files it throws exception at

Bitmap ImgPDF = new Bitmap(MS);

Parameter invalid

i am really confused. why this happens. Is it due to security difference of files or some other reason? Help me to resolve this.

2
you can use Apitron.PDF Rasterizer for document to image conversionstanlyF

2 Answers

2
votes

You need to check the stream's /Filter to see what image format a given image uses. It may be a standard image format:

  1. DCTDecode (jpeg)

  2. JPXDecode (jpeg 2000)

  3. JBIG2Decode (jbig is a B&W only format)

  4. CCITTFaxDecode (fax format, PDF supports group 3 and 4)

Other than that, you'll need to get the raw bytes (as you are), and build an image using the image stream's width, height, bits per component, number of color components (could be CMYK, indexed, RGB, or Something Weird), and a few others, as defined in section 8.9 of the ISO PDF SPECIFICATION (available for free).

So in some cases your code will work, but in others, it'll fail with the exception you mentioned. Source

0
votes

I think I had the same problem. In my case exception were thrown when image was in jbig2 format. In my case image stream had width and height set to 0 and stream had some bytes. Unfortunately I haven't solution for this.