How to extract images from pdf using iText7 c#

Question

Below approach i have used to extract images from pdf. But sub type is always giving null. I am working with iText7 library which is new version. If any body worked with new library please give suggestions.

    public static string ExtractImageFromPDF(string sourcePdf)
    {            
        PdfReader reader = new PdfReader(sourcePdf);
        try
        {
            PdfDocument document = new PdfDocument(reader);

            for (int pageNumber = 1; pageNumber <= document.GetNumberOfPages(); pageNumber++)
            {
                PdfDictionary obj = (PdfDictionary)document.GetPdfObject(pageNumber);

                if (obj != null && obj.IsStream())
                {
                    PdfDictionary pd = (PdfDictionary)obj;
                    if (pd.ContainsKey(PdfName.Subtype) && pd.Get(PdfName.Subtype).ToString() == "/Image")
                    {
                        string filter = pd.Get(PdfName.Filter).ToString();
                        string width = pd.Get(PdfName.Width).ToString();
                        string height = pd.Get(PdfName.Height).ToString();
                        string bpp = pd.Get(PdfName.BitsPerComponent).ToString();
                        string extent = ".";
                        byte[] img = null;
                        switch (filter)
                        {
                            case "/FlateDecode":
                                byte[] arr = FlateDecodeFilter.FlateDecode(null, true);
                                Bitmap bmp = new Bitmap(Int32.Parse(width), Int32.Parse(height), PixelFormat.Format24bppRgb);
                                BitmapData bmd = bmp.LockBits(new Rectangle(0, 0, Int32.Parse(width), Int32.Parse(height)), ImageLockMode.WriteOnly,
                                    PixelFormat.Format24bppRgb);
                                Marshal.Copy(arr, 0, bmd.Scan0, arr.Length);
                                bmp.UnlockBits(bmd);
                                bmp.Save("d:\\pdf\\bmp1.png", ImageFormat.Png);
                                break;
                            case "/CCITTFaxDecode":
                                break;
                            default:
                                break;
                        }
                    }
                }
            }
        }
        catch
        {
            throw;
        }
        return "";
    }

"it is returning null" nothing in the code you've posted returns null. — Ian Kemp
the correct is document.GetPdfObject(objectNumber), not document.GetPdfObject(pageNumber) — Tomex Ou

Ronald van der Plas Ronald van der Plas · Accepted Answer · 2019-10-17T12:12:43

When you use Quickwatch on the pd value, what do you see is in there? The documentation of the iText 7 states is a dictionary, so perhaps you can check which types are available and find the appropriate field that you're looking for.

PdfDictionary pd = (PdfDictionary)obj;

Documentation can be found overhere: https://api.itextpdf.com/iText7/dotnet/7.1.8/classi_text_1_1_kernel_1_1_pdf_1_1_pdf_dictionary.html

How to extract images from pdf using iText7 c#

2 Answers