0
votes

I have about 1000 URLs that link to remote PDF files that I need to determine which are searchable by Safari and which are not. I have my script looping and opening the URLs in Safari but I am stuck on the last 2 steps below.

Can someone help? Thanks

The script needs to:

For each URL:

Tell Safari to

  1. Open a given URL (in this case a remote PDF)
  2. Search the PDF for the character "a" Using the find that pops up on a right-click, not Apple F enter image description here

  3. Write the search result to a file

      set urlList to {"http://pricelist.list.com/pricelists/A/AEA_11-15-12.pdf", "http://pricelist.list.com/pricelists/A/API_1608_04-05-13.pdf", "http://pricelist.list.com/pricelists/A/Access_02-01-12.pdf", "http://pricelist.list.com/pricelists/A/Allparts_Retail_01-01-11.pdf"}
       set numURLs to (count urlList)
       repeat with i from 1 to (numURLs)
    
    set theURL to (item i of urlList)
    tell application "Safari"
        open location theURL
        activate
        --Perform search
        --Write results to file
    end tell
    tell application "System Events"
        tell process "Safari"
            click menu item "Close Other Tabs" of menu "File" of menu bar 1
        end tell
    end tell
    delay 5
    

    end repeat

1

1 Answers

0
votes

It might be easier to download the PDFs and use shell scripting:

brew install poppler wget parallel
cat ~/Documents/urls.txt | parallel -P8 wget
for f in *.pdf; do [[ $(pdffonts -- "$f" 2> /dev/null | wc -l) -eq 2 ]] && printf %s\\n "$f"; done

pdffonts prints two lines of output for scanned PDFs that don't have embedded fonts. See How do I determine programmatically if a PDF is searchable?.