Intro
I have an issue with digitally signing PDF documents that have been marked as PDF/A - 3A compliant. With PDFBox (latest version, 2.0.24) I get in the end an invalid signature in Adobe Acrobat, while with iText7 (latest version) I get a valid signature. The goal is to get PAdES LTV compliant signatures.
Overview
My process is the following (with both PDFBox and iText7):
- I open the PDF, I create the hash for signing (data to be signed)
- I call the 3rd party service for getting back the digital signature
- In the service response I also get the OCSP and CRL content that I need to embed in the PDF for LTV quality
- I embed the signature in the PDF
- I save the document to memory, then I reopen it for embedding the OCSP and CRL
- I embed the OCSP and CRL items, creating the respective DSS and VRI dictionaries
- I save the PDF to disk
For PDFBox, the code for signing is here and for OCSP/CRL embedding is here. For iText7, the code for signing and for OCSP/CRL embedding is here.
The problem
Now, this works OK for most PDF files, including multi-signature documents. The problem is with one particular PDF, that is created as PDF/A compliat, level 3A.
With PDFBox, if I just embed the signature and open the document in Adobe Acrobet, the signature is valid. If I also embed the OCSP/CRL content, the signature is no longer valid. Adobe Acrobat complains that:
Signature is invalid: Document has been altered or corrupted since it was signed.
I also noticed that just by doing:
document.load(inputStream);
document.save(outputStream);
I break the signature. From my tests, the actual embedding is not really the cause of the issue, but just the fact that I reopen the PDF after embedding the signature and save it back to disk.
With the same process (keys, certificate, etc) via iText7 I get a valid LTV signature in the end, in Adobe Acrobat.
Sample PDFs
The sample documents are here. The original contains the unsigned document, and then there are 2 samples, one for PDFBox (invalid in Adobe Acrobat) and one for iText7 (valid in Adobe Acrobat).
My research so far shows that somehow PDFBox is breaking the order of the elements when loading the PDF after signature embedding. It hints at this issue with loading and saving documents, though for ALL the other PDFs I do the same process and Adobe Acrobat does not complain about the signature.
I also tried with PDFBox 2.1.0-SNAPSHOT and 3.0.0-SNAPSHOT, hoping that the issue is related to ordering of elements in PDF and it was fixed. Still, I get the same results.
Later edit 1
Please see the Later edit 2 below, this Later edit 1 here is not a good idea!
As per the accepted answer below from @mkl, the issue is with the original PDF file, which contains the cross reference table split into several subsections instead of one. This seems to be caused by the library (Aspose PDF for .NET, version 21.3 or earlier) used by the service that generated the PDF in the first place.
One workaround that seems to work with my current code is the following:
PDDocumentInformation info = pdDocument.getDocumentInformation();
if (info != null && StringUtils.containsIgnoreCase(info.getProducer(), "Aspose")) {
try {
pdDocument.save(inMemoryStream);
pdDocument.close();
pdDocument = PDDocument.load(inMemoryStream.toByteArray());
inMemoryStream.reset();
} catch (Exception e) {
Basically if I detect that the producer of the document is Aspose, I save the document in memory (via PDFBox' pdDocument.save()) and load it back. This ensures the cross reference table is written correctly in memory and from there the signing and OCSP+CRL embedding works as expected, yielding a valid signature in Adobe Acrobat.
Later edit 2
Thank you @mkl and @TilmanHausherr, you are right. It is not a good idea to assume that all documents produced with a certain library have to automatically be normalized, as existing signatures will be invalidated. In the end, the better idea is to keep the code as it was and expect a properly constructed PDF. Fix the problem where it is created.