How to move XFA xml data into PDF/A-2 conforming File with iText/XFA Worker

Question

In the Adobe's ISO 32000 spec for PDF/A it states that XFA data can be stored in a special place in the PDF/A-2 confirming PDF. Here is the text of that section.

Incorporation of XFA Datasets into a PDF/A-2 Conforming File To support PDF/A-2 conforming files, ExtensionLevel 3 adds support for XML form data (XFA datasets) through the XFAResources name tree, which is part of the name dictionary of the document catalog.

(See “TABLE 3.28 Entries in the name dictionary” on page 23.) While Acrobat forms (and form data) are permitted in a PDF/A-2 conforming file, XML forms are not. Such XML forms are specified as XDP streams referenced from interactive form dictionaries. XDP streams can contain XFA datasets.

For applications that convert PDF documents to PDF/A-2, the XFAResources name tree supports relocation of XML form data from XDP streams in a PDF document into the XFAResources name tree.

The XFAResources name tree consists of a string name and an indirect reference to a stream. The string name is created at the time the document is converted to a PDF/A-2 conforming file. The stream contains the element of the XFA, comprised of elements.

In addition to data values for XML form fields, the elements enable the storage and retrieval of other types of information that may be useful for other workflows, including data that is not bound to form fields, and one or more XML signature(s).

See the XML Architecture, XML Forms Architecture (XFA) Specification, version 2.6 in the Bibliography

We have an XFA Form that we pass xml to and now need to convert that document to PDF/A-2.

We are currently testing out XFA Worker to see if that will allow us to do this, I have been unable to find a sample of XFA Worker that will do this for us.

I first tried to flatten with XFA Worker but that removes the data completely and is no longer able to be extracted.

How do you get the XFA xml data into the place that Adobe says to put it in with XFA Worker?

UPDATE: Thanks Bruno, my code isn't allowing me to convert the XFA Form to PDF/A-2. Here is the code I used.

    xfa.fillXfaForm(new ByteArrayInputStream(xmlSchemaStream.toByteArray()));

    stamper.close();
    reader.close();

    try (ByteArrayOutputStream outputStreamDest = new ByteArrayOutputStream()) {
        PdfReader pdfAReader = new PdfReader(output.toByteArray());

        PdfAStamper pdfAStamper = new PdfAStamper(pdfAReader, outputStreamDest, PdfAConformanceLevel.PDF_A_2A);
....

and I get an error com.itextpdf.text.pdf.PdfAConformanceException: Only PDF/A documents can be opened in PdfAStamper.

So I am now assuming the new PdfAStamper isn't a converter but just reading in the byte array of the XFA PDF.

Er... Of course PdfAStamper is not a converter. It's a class that allows you to stamp extra content (watermarks, page numbers, fill out forms) to an existing PDF/A document. You can't "feed" it an XFA form. PdfAStamper expects a PDF/A document. — Bruno Lowagie
You said you were using XML Worker to convert XFA data to a PDF/A document, but now you have changed your question by saying that you use PdfAStamper. That is very confusing. I assumed that you were using XSLT on the XML embedding in the XFA form to convert the XFA data to HTML. I assumed that you were converting that HTML to PDF using XML Worker. Now I'm not so sure anymore. — Bruno Lowagie
How do you use XML Worker to fill in the already created Court PDF? If you can do that, you know more about XML Worker than I do (and I'm the original developer of iText). — Bruno Lowagie
Note: if you use xfa.fillXfaForm(new ByteArrayInputStream(xmlSchemaStream.toByteArray())); then you are not using XML Worker. You are using core iText functionality. Maybe you're not using XML Worker at all. In that case, please don't confuse the Stack Overflow visitor into thinking that you are. That's confusing. Bottom line: once you have filled out the form like this xfa.fillXfaForm(new ByteArrayInputStream(xmlSchemaStream.toByteArray())); you need XFA Worker to flatten that form. — Bruno Lowagie
I have no clue about what you're saying. First you claim that the US Courts demand that you use XFA, now you claim that you can do without XFA. That all sounds very strange to me. — Bruno Lowagie

Bruno Lowagie Bruno Lowagie · Accepted Answer · 2016-11-04T06:55:43

Allow me to start with some fatherly advice. XFA will be deprecated in ISO-32000-2 (PDF 2.0) and it is great that you are turning your XFA documents into PDF/A documents. However, why would you choose for PDF/A-2? PDF/A-3 is identical to PDF/A-2 with one exception: in PDF/A-3, you are allowed to embed XML files. You can even indicate the relationship between the attached XML and the PDF. Wouldn't it be smarter to create a PDF/A-3 file and to attach the original data (not the XFA file) as an attachment?

Suppose that you'd ignore this fatherly advice, what could you do?

Annex D of ISO-19005-2 (and -3) tells you that you have to add an entry to the Names dictionary of the document catalog. Unfortunately, iText 5 doesn't allow you to add your own entries to this names dictionary while creating a file, so you will have to post-process the document.

Suppose that you have a file located in filePath, then you can get the Catalog entry and the Names entry of the Catalog entry like this:

PdfReader reader = new PdfReader(filePath);
PdfDictionary catalog = reader.getCatalog();
PdfDictionary names = catalog.getAsDict(PdfName.NAMES);

You can add entries to this names dictionary. For instance: suppose that I want to add a stream with content some bytes as a custom entry, I would use this code:

public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
    PdfReader reader = new PdfReader(src);
    PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
    PdfDictionary catalog = reader.getCatalog();
    PdfDictionary names = catalog.getAsDict(PdfName.NAMES);
    if (names == null) {
        names = new PdfDictionary();
    }
    PdfStream stream = new PdfStream("Some bytes".getBytes());
    PdfIndirectObject objref = stamper.getWriter().addToBody(stream);
    names.put(new PdfName("ITXT_Custom"), objref.getIndirectReference());
    catalog.put(PdfName.NAMES, names);
    stamper.close();
    reader.close();
}

The result would look like this:

In your case, you don't want to entry named ITXT_Custom. You want to add an entry called XFAResources and the value of that entry should be a name tree consisting of a string name and an indirect reference to a stream. It should be fairly easy to adapt my example to achieve this.

Note: All code provided by me on Stack Overflow can be used under the CC-BY-SA as defined in the Stack Exchange Network Terms of Service. If you do not like the CC-BY-SA, I also provide this code under the same license as used for iText, more specifically the AGPL.

How to move XFA xml data into PDF/A-2 conforming File with iText/XFA Worker

1 Answers