PAdES LTV signing of a PDF/A-3A document yields invalid signature

Question

Intro

I have an issue with digitally signing PDF documents that have been marked as PDF/A - 3A compliant. With PDFBox (latest version, 2.0.24) I get in the end an invalid signature in Adobe Acrobat, while with iText7 (latest version) I get a valid signature. The goal is to get PAdES LTV compliant signatures.

Overview

My process is the following (with both PDFBox and iText7):

I open the PDF, I create the hash for signing (data to be signed)
I call the 3rd party service for getting back the digital signature
In the service response I also get the OCSP and CRL content that I need to embed in the PDF for LTV quality
I embed the signature in the PDF
I save the document to memory, then I reopen it for embedding the OCSP and CRL
I embed the OCSP and CRL items, creating the respective DSS and VRI dictionaries
I save the PDF to disk

For PDFBox, the code for signing is here and for OCSP/CRL embedding is here. For iText7, the code for signing and for OCSP/CRL embedding is here.

The problem

Now, this works OK for most PDF files, including multi-signature documents. The problem is with one particular PDF, that is created as PDF/A compliat, level 3A.

With PDFBox, if I just embed the signature and open the document in Adobe Acrobet, the signature is valid. If I also embed the OCSP/CRL content, the signature is no longer valid. Adobe Acrobat complains that:

Signature is invalid: Document has been altered or corrupted since it was signed.

I also noticed that just by doing:

document.load(inputStream);
document.save(outputStream);

I break the signature. From my tests, the actual embedding is not really the cause of the issue, but just the fact that I reopen the PDF after embedding the signature and save it back to disk.

With the same process (keys, certificate, etc) via iText7 I get a valid LTV signature in the end, in Adobe Acrobat.

Sample PDFs

The sample documents are here. The original contains the unsigned document, and then there are 2 samples, one for PDFBox (invalid in Adobe Acrobat) and one for iText7 (valid in Adobe Acrobat).

My research so far shows that somehow PDFBox is breaking the order of the elements when loading the PDF after signature embedding. It hints at this issue with loading and saving documents, though for ALL the other PDFs I do the same process and Adobe Acrobat does not complain about the signature.

I also tried with PDFBox 2.1.0-SNAPSHOT and 3.0.0-SNAPSHOT, hoping that the issue is related to ordering of elements in PDF and it was fixed. Still, I get the same results.

Later edit 1

Please see the Later edit 2 below, this Later edit 1 here is not a good idea!

As per the accepted answer below from @mkl, the issue is with the original PDF file, which contains the cross reference table split into several subsections instead of one. This seems to be caused by the library (Aspose PDF for .NET, version 21.3 or earlier) used by the service that generated the PDF in the first place.

One workaround that seems to work with my current code is the following:

PDDocumentInformation info = pdDocument.getDocumentInformation();
if (info != null && StringUtils.containsIgnoreCase(info.getProducer(), "Aspose")) {
try {
    pdDocument.save(inMemoryStream);
    pdDocument.close();
    pdDocument = PDDocument.load(inMemoryStream.toByteArray());
    inMemoryStream.reset();
} catch (Exception e) {

Basically if I detect that the producer of the document is Aspose, I save the document in memory (via PDFBox' pdDocument.save()) and load it back. This ensures the cross reference table is written correctly in memory and from there the signing and OCSP+CRL embedding works as expected, yielding a valid signature in Adobe Acrobat.

Later edit 2

Thank you @mkl and @TilmanHausherr, you are right. It is not a good idea to assume that all documents produced with a certain library have to automatically be normalized, as existing signatures will be invalidated. In the end, the better idea is to keep the code as it was and expect a properly constructed PDF. Fix the problem where it is created.

It seems like you are generating/signing the PDF document using Aspose.PDF for .NET. Please make sure to use the latest version of the API i.e. 21.8 and if you still receive a PDF with issues, please share the complete details and code snippet in our official Aspose.PDF support forum (forum.aspose.com/c/pdf/10) where we will investigate and address the issue accordingly. Please note that our support forum is the right place to report and track such issues. This is Asad Ali and I am Developer Evangelist at Aspose. — Asad Ali
@AsadAli More to the point, the OP appears to have been given a PDF generated using Aspose.PDF as test document for signing methods based on iText and PDFBox. In particular he does not sign with Aspose.PDF, so he couldn't have been aware that addressing the official Aspose.PDF support forum was a preferred option before realizing that the problem was actually caused by an error in his test document and not his code. — mkl
Your last edit kindof voids the whole idea of signing an existing file. (And if your file already has a signature, then you saving it would make that signature invalid) — Tilman Hausherr
Concerning your second edit - you might want to consider to change your iText code also to sign in incremental updates in order to not destroy previous signatures. Or first check whether the PDF contains signatures and in that case use append mode. — mkl

mkl mkl · Accepted Answer · 2021-08-16T16:46:32

The problem is caused by an error in the original PDF. Your PDFBox code signs in append mode (i.e. in an incremental update), so that error is present in the signed version, too. Your iText code does not sign in append mode but instead re-writes the whole PDF; while doing so it does not make the same error as the producer of your original PDF, so the error is not in the signed version anymore. Adobe Acrobat is very sensitive to such issues when validating signatures with updates.

The Error

The cross reference table of the initial revision in a PDF must not be split into separate subsections but in case of your original PDF it has been split:

0 75
0000000000 65535 f
0000000018 00000 n
...
0000313374 00000 n
0000313397 00000 n
76 20
0000313419 00000 n
0000313443 00000 n
...
0000846048 00000 n
0000846175 00000 n

Similar cases have been discussed in this answer, this answer, this answer, and elsewhere; you can also find some specification references in those answers.

Usually this goes unnoticed, Adobe Acrobat is usually quite lax when encountering small issues in PDFs.

Usually, that is, except when validating documents with integrated signatures and incremental updates after the signed revision, in that situation Adobe Acrobat often considers such issues suspect and fails validation of the signature, even though it doesn't complain when validating the same PDF without the incremental updates after the signed revision.

You are in that critical situation, your final document contains an incremental update after the signed revision, an update with validation related information.

Who Caused The Error?

According to the Info dictionary of your original PDF it has been produced by "Aspose.PDF for .NET 21.3.0". Earlier version of Aspose.PDF are known to create such faulty cross reference tables (see section "The PDF processor that damages the PDF" of the first answer referenced above). Apparently Aspose have not yet gotten around to fix this issue for good.