I have several PDF documents that supposedly contain scanned images, but upon inspection in Acrobat Pro, each page contains a huge number of tiny "inline images". From what I understand these are not regular images inside XObjects, but rather images embedded directly inside content streams.
How could I go about extracting and merging these images?
The only code I could find online starts out like this:
var reader = new PdfReader(@"path\to\file.pdf");
PdfDocument document = new PdfDocument(reader);
for (var i = 1; i <= document.GetNumberOfPages(); i++)
{
PdfDictionary obj = (PdfDictionary)document.GetPdfObject(i);
// ... more code goes here
}
...but the rest of the code doesn't work because the PdfDictionary returned from GetPdfObject is not a stream, only a dictionary. I don't know how to access the images inside it.