1
votes

I want to convert scanned pdf files to text-searchable pdf files. I want to give an input as a scanned PDF then my expected output is searchable PDF.

There are few tools which give us the text as output from scanned pdf file but I want text searchable pdf file as output, not just the text.

I have searched about it and found 1 solution here but my Production server is amazon centos and installation of this tool is only working for ubuntu not for amazon centos.

I am ready to pay for it if required. Please help me to give the link of any open source web api or paid web api services or any tools which can convert to text searchable pdf file.

I am using PHP language in my web applicatin.

1
What volume of images are you wanting to do at one time? I keep seeing people suggesting to use online services but this isn't reasonable if a) your documents are sensitive such as healthcare documentation or b) you need high throughput for several documents per minute over a sustained amount of time.Newclique

1 Answers

1
votes

There are several commercial web API services that will convert scanned PDFs (or scanned images generally) to searchable PDF. Of these, I would recommend trying ABBYY's Cloud OCR SDK. They've been in the OCR space for decades and use their own OCR engine, which tends to give better OCR results than APIs based off other technologies (e.g. Tesseract) based on my observations and what I've heard from others.