Not seeing PDF fields from Excel VBA

Question

I've been tasked with creating an Excel tool to import data from thousands of PDFs into Excel. I've installed Adobe Acrobat X Professional, and it seems to be working fine. I'm using Excel 2007.

I found sample code here: http://khkonsulting.com/2010/09/reading-pdf-form-fields-with-vba/

And modifying it only slightly to prompt me to browse to the PDF file, and setting a reference to "Adobe Acrobat 10.0 Type Library", I'm trying this code:

Sub Main()
Dim AcroApp As Acrobat.CAcroApp
Dim theForm As Acrobat.CAcroPDDoc
Dim jso As Object
Dim text1, text2 As String

Set AcroApp = CreateObject("AcroExch.App")
Set theForm = CreateObject("AcroExch.PDDoc")

Dim strFullPath As String
strFullPath = Application.GetOpenFilename()
theForm.Open (strFullPath) 'theForm.Open ("C:\temp\sampleForm.pdf")

Set jso = theForm.GetJSObject

' get the information from the form fields Text1 and Text2
text1 = jso.getfield("MFR_ctrl33605579").Value 'jso.getfield("Text1").Value
'(etc)
End Sub

("MFR_ctrl33605579" is the name of one of the text fields; I found that in the editor that is part of or included with Acrobat Pro, called "Adobe LiveCycle Designer".)

It runs without error until it gets to the getfield method. Then I get the error "Object required".

If I run TypeName on AcroApp, theForm, and jso, I get CAcroApp, CAcroPDDoc, and object respectively:

?typename(AcroApp)
CAcroApp
?typename(theForm)
CAcroPDDoc
?typename(jso)
Object

If I put them in the Watches window, I initially get plus signs to the left of all three of them, but if I click on those plus signs, only one line appears under each one, saying "< No Variables >" in the Value column.

I wondered if the document was somehow locked against being read in this way. I read that if it was locked, a padlock would appear at the upper left in Acrobat, and it would not allow saving as text. But no padlock appears that I see, and it does allow saving as plain text. So as far as I can tell it doesn't seem to be locked.

Any suggestions on what I can try?

UPDATE:

I'm just getting started trying the suggestions, but I wanted to note that after trying rheitzman's For loop to get the field name (using getNthFieldName), I see that my code works if I use this for the field name:

form1[0].QuestionnaireForm[0].sbfrmProfile[0].sbfrmContact[0].sbfrmManufacturerDetails[0].MFR_ctrl33605579[0]

Or in other words:

text1 = jso.getfield("MFR_form1[0].QuestionnaireForm[0].sbfrmProfile[0].sbfrmContact[0].sbfrmManufacturerDetails[0].MFR_ctrl33605579[0]").Value

That would allow me to identify fields by that very long, apparently fully-qualified reference, which would get me through the project. But first I'm going to check the other ideas to see if the fields can be found by short name only.

UPDATE 2:

I see now that I can inspect some details of some of the object model in the Object Browser (filter Libraries on Acrobat), even though it doesn't show up in the Watch window.

However, it doesn't show objects created by methods, such as the object created by the AcroExch.PDDoc.GetJSObject method (see AcroPDDoc in the object model). The method of that object used here, getNthFieldName, does not appear in the object browser at all.

And, I don't see any other way there of identifying the fields by their short field names.

So ... for this project, I'm just going to use the long field names returned by getNthFieldName.

I would think that perhaps that's not quite the field name. It errors there and you get "Object required". Is there some whitespace at the end of the field? If you change the name of the fields to something else and update the code with that name does the same error occur? my two cents. — mrbungle
getNthFieldName is a method on AcroPDDoc(), in the JavaScript references at least, so perhaps some of the other properties and methods are also valid. I think JSObject() may support a subset - I doubt it can be used to create objects. JSObject is evidently an Interop creature that isn't documented anywhere I can find. — rheitzman
Here's a JavaScript reference that lists properties and methods - not sure what the overlap is with the Acrobat 1.0 interop.dll adobe.com/content/dam/Adobe/en/devnet/acrobat/pdfs/… see print page 10/11 for getNthFieldName. Doesn't document the objects, just lists them. — rheitzman

rheitzman rheitzman · Accepted Answer · 2015-10-20T23:33:14

iTextSharp is much easier to use!

That said, here's a snippet to read all fields.

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    Dim pdfForm As Acrobat.AcroPDDoc
    Dim jso As Object ' ?
    Dim s As String = ""
    Dim i As Integer
    Dim strFullPath As String = "H:\OIS\ENFORCE\OPEN_BURN\Ag\temp\Open_Burn_Template_Out.pdf"
    Try
        pdfForm = New Acrobat.AcroPDDoc
        pdfForm.Open(strFullPath)
        jso = pdfForm.GetJSObject
        For i = 0 To jso.numfields() - 1
            s = jso.getNthFieldName(i)
            Debug.Print(s & ": " & jso.getField(s).value)
        Next
    Catch ex As Exception
        MsgBox(ex.Message)
    End Try
End Sub

It could be your "form" doesn't have any fields (or a flattened form), or the names are off.

The JSObject properties are a bit of a mystery. I found a post that used reflection to extract data but it just so happens that the JSObject properties are usable if you know what they are! e.g. numfields, getNthFieldName

If anyone finds a reference link for JSObject please post in comments.

Not seeing PDF fields from Excel VBA

2 Answers