1
votes

I'm using a ScanSnap S1500M to scan all paper documents to the folder /PDF-scans/ – I'd like to use Adobe Acrobat X Professional to OCR the text.

I'd like to automate this process (daily):

  • open Acrobat X Pro
  • batch OCR process PDF files in /PDF-scans/, append "-OCR" to filename
  • after OCR, move files to /PDF-ocr/
  • delete original PDF files in /PDF-scans/

Should I use Automator? Is there a script that can do this? Does it have to be tied to iCal's repeating events?

Thank you.

1
you could tell automator to call your applescript for every new file arriving... in the applescript you just handle alle necessary actions... Adobe application have are scriptable via AppleScript and even javascript... - Yahia
@Yahia: That's not entirely true. Acrobat is only barely scriptable with Applescript, and Adobe has pushed all API development to Javascript. Also, Automator is not required; Applescript could handle all of those tasks deftly when it is properly implemented within an application. I can't speak to Javascript's capabilities here. - Philip Regan

1 Answers

1
votes

I would download PDFPen which allows you to ocr documents easily. Once you've done that, you can use this script:

set the PDF_folder to "where:Ever:Your:PDF:folder:is:" as alias
set the OCR_folder to "/where/ever/you/want/the/new/folder/to/be" as POSIX file

tell application "Finder"
    repeat with this_PDF in (every item of the PDF_folder)
        my ocr(this_PDF)
    end repeat
end tell

on ocr(this_PDF)
    tell application "PDFpen"
        open this_PDF as alias
        tell document 1
            ocr --simple
            repeat while performing ocr
                delay 1
            end repeat
            delay 1
        end tell
        set this_PDF to (save document 1 in this_PDF)
        close document 1
    end tell
    tell application "Finder"
        if not exists OCR_folder then set the OCR_folder to (make new folder at (the OCR_folder as alias with properties {name:"ocr"})
        move this_PDF to the OCR_folder with replacing
    end tell
end ocr