We are using iTextSharp with a C# WinForms application to parse a PDF file. Using iTextSharp, I can easily extract the text data from the PDF file. Suppose a PDF file contains an image surrounded by two lines of text. In this case, I could not extract the information about the image.
My requirement is:
- Get structural elements of the PDF file
- Process whether each is of type text, image, table or other
For example, the structural elements are similar to the following:
text :paragraph1
text :paragraph2
Image:Image
text :paragraph3
Table:table info
text :Paragraph4
If I can obtain information in a format like this, I can easily understand the text, image, table, header or footer information.
So, is it possible to get this kind of information using iTextSharp? If yes, please enlighten me on this. Otherwise, could you please suggest some other tools capable of meeting this requirement?
Thanks to all,
Saravanan