Processing multi-page PDFs with an OpenCV program

Question

I need to process multi-page PDFs that are scanned in to me using a program I wrote in C++ using OpenCV libraries. OpenCV does not read in PDFs, so I am currently using pdftk to break up the PDF, and convert -density 300 page##.pdf page##.png to convert the individual pages to PNGs before reading them in with my program.

The issue is convert takes about 30 seconds on my Raspberry Pi to do this conversion. Is there an easier way to convert multi-page PDFs in a way that can be read in my C++/OpenCV?

You describe the pdfs to process as been scanned in to you. Does that mean they have been created by an optical scanning process? In that case each page in the PDF may actually consist of but one image and you could use image extraction programs instead, which require less resources after all. — mkl

Saqlain Saqlain · Accepted Answer · 2013-01-27T18:05:02

You can try converting PDF to PNG with

http://www.foolabs.com/xpdf/

https://github.com/coolwanglu/pdf2htmlEX (it convert PDF to html, but do generate png as images corresponding to each page, which can be used)

Processing multi-page PDFs with an OpenCV program

1 Answers