I've been tasked with creating an Excel tool to import data from thousands of PDFs into Excel. I've installed Adobe Acrobat X Professional, and it seems to be working fine. I'm using Excel 2007.
I found sample code here: http://khkonsulting.com/2010/09/reading-pdf-form-fields-with-vba/
And modifying it only slightly to prompt me to browse to the PDF file, and setting a reference to "Adobe Acrobat 10.0 Type Library", I'm trying this code:
Sub Main()
Dim AcroApp As Acrobat.CAcroApp
Dim theForm As Acrobat.CAcroPDDoc
Dim jso As Object
Dim text1, text2 As String
Set AcroApp = CreateObject("AcroExch.App")
Set theForm = CreateObject("AcroExch.PDDoc")
Dim strFullPath As String
strFullPath = Application.GetOpenFilename()
theForm.Open (strFullPath) 'theForm.Open ("C:\temp\sampleForm.pdf")
Set jso = theForm.GetJSObject
' get the information from the form fields Text1 and Text2
text1 = jso.getfield("MFR_ctrl33605579").Value 'jso.getfield("Text1").Value
'(etc)
End Sub
("MFR_ctrl33605579" is the name of one of the text fields; I found that in the editor that is part of or included with Acrobat Pro, called "Adobe LiveCycle Designer".)
It runs without error until it gets to the getfield method. Then I get the error "Object required".
If I run TypeName on AcroApp, theForm, and jso, I get CAcroApp, CAcroPDDoc, and object respectively:
?typename(AcroApp)
CAcroApp
?typename(theForm)
CAcroPDDoc
?typename(jso)
Object
If I put them in the Watches window, I initially get plus signs to the left of all three of them, but if I click on those plus signs, only one line appears under each one, saying "< No Variables >" in the Value column.
I wondered if the document was somehow locked against being read in this way. I read that if it was locked, a padlock would appear at the upper left in Acrobat, and it would not allow saving as text. But no padlock appears that I see, and it does allow saving as plain text. So as far as I can tell it doesn't seem to be locked.
Any suggestions on what I can try?
UPDATE:
I'm just getting started trying the suggestions, but I wanted to note that after trying rheitzman's For loop to get the field name (using getNthFieldName), I see that my code works if I use this for the field name:
form1[0].QuestionnaireForm[0].sbfrmProfile[0].sbfrmContact[0].sbfrmManufacturerDetails[0].MFR_ctrl33605579[0]
Or in other words:
text1 = jso.getfield("MFR_form1[0].QuestionnaireForm[0].sbfrmProfile[0].sbfrmContact[0].sbfrmManufacturerDetails[0].MFR_ctrl33605579[0]").Value
That would allow me to identify fields by that very long, apparently fully-qualified reference, which would get me through the project. But first I'm going to check the other ideas to see if the fields can be found by short name only.
UPDATE 2:
I see now that I can inspect some details of some of the object model in the Object Browser (filter Libraries on Acrobat), even though it doesn't show up in the Watch window.
However, it doesn't show objects created by methods, such as the object created by the AcroExch.PDDoc.GetJSObject method (see AcroPDDoc in the object model). The method of that object used here, getNthFieldName, does not appear in the object browser at all.
And, I don't see any other way there of identifying the fields by their short field names.
So ... for this project, I'm just going to use the long field names returned by getNthFieldName.
getNthFieldNameis a method on AcroPDDoc(), in the JavaScript references at least, so perhaps some of the other properties and methods are also valid. I think JSObject() may support a subset - I doubt it can be used to create objects. JSObject is evidently an Interop creature that isn't documented anywhere I can find. - rheitzman