Verifying PDF Signature in Java using Bouncy Castle and PDFBox

Question

I am trying to verify digitally signed PDF document in Java.

I'm using Apache PDFBox 2.0.6 to get the signature and the original PDF that was signed, then I'm using Bouncy Castle to verify detached signature(calculate the hash of the original file, verify the signature using signer's public key and compare the results).

I read this article and tried to get the signature bytes and the original PDF bytes using this code:

PDDocument doc = PDDocument.load(signedPDF);
    byte[] origPDF = doc.getSignatureDictionaries().get(0).getSignedContent(signedPDF);
    byte[] signature = doc.getSignatureDictionaries().get(0).getContents(signedPDF);

But, when I save the origPDF to a file I notice that it still has the signature field that the original PDF that was signed didn't have. Also, the size of the save origPDF is 21 kb, while the size of the original PDF was 15 kb. That's probably because of the signature fields.

However, when I try to strip signature fields from the origPDF like this:

public byte[] stripCryptoSig(byte[] signedPDF) throws IOException {

    PDDocument pdDoc = PDDocument.load(signedPDF);
    PDDocumentCatalog catalog = pdDoc.getDocumentCatalog();
    PDAcroForm form = catalog.getAcroForm();
    List<PDField> acroFormFields = form.getFields();
    for (PDField field: acroFormFields) {
        if (field.getFieldType().equalsIgnoreCase("Sig")) {
            System.out.println("START removing Sign Flags");
            field.setReadOnly(true);
            field.setRequired(false);
            field.setNoExport(true);
            System.out.println("END removing Sign Flags");

            /*System.out.println("START flattenning field");            
            field.getAcroForm().flatten();
            field.getAcroForm().refreshAppearances();
            System.out.println("END flattenning field");
            */
            field.getAcroForm().refreshAppearances();
        }
    }

I get the following warrnings:

WARNING: Invalid dictionary, found: '[' but expected: '/' at offset 15756

WARNING: Appearance generation for signature fields not yet implemented - you need to generate/update that manually

And, when I open the PDF in Acrobat the signature field is gone, but I see an image of the signature where the signature used to be as part of the PDF page. This is weird since I thought I removed the signature completely by using byte[] origPDF = doc.getSignatureDictionaries().get(0).getSignedContent(signedPDF);

Btw, I call stripCryptoSig(byte[] signedPDF) function on origPDF, so that's not a mistake.

When I try to verify the signature using bouncy castle I get an exception with the message: message-digest attribute value does not match calculated value

I guess this is because the original PDF that was signed and the PDF I get from PDFBox using doc.getSignatureDictionaries().get(0).getSignedContent(signedPDF); isn't the same.

Here is my bouncy castle verification code:

private SignatureInfo verifySig(byte[] signedData, boolean attached) throws OperatorCreationException, CertificateException, CMSException, IOException {

    SignatureInfo signatureInfo = new SignatureInfo();
    CMSSignedData cmsSignedData;

    if (attached) {
        cmsSignedData = new CMSSignedData(signedData);
    }

    else {
        PDFUtils pdfUtils = new PDFUtils();
        pdfUtils.init(signedData);
        signedData = pdfUtils.getSignature(signedData);
        byte[] sig = pdfUtils.getSignedContent(signedData);
        cmsSignedData = new CMSSignedData(new CMSProcessableByteArray(signedData), sig);
    }

    SignerInformationStore sis = cmsSignedData.getSignerInfos();
    Collection signers = sis.getSigners();
    Store certStore = cmsSignedData.getCertificates();
    Iterator it = signers.iterator();
    signatureInfo.setValid(false);
    while (it.hasNext()) {
        SignerInformation signer = (SignerInformation) it.next();
        Collection certCollection = certStore.getMatches(signer.getSID());

        Iterator certIt = certCollection.iterator();
        X509CertificateHolder cert = (X509CertificateHolder) certIt.next();

        if(signer.verify(new JcaSimpleSignerInfoVerifierBuilder().build(cert))){

            signatureInfo.setValid(true);

            if (attached) {
                CMSProcessableByteArray userData = (CMSProcessableByteArray) cmsSignedData.getSignedContent();
                signatureInfo.setSignedDoc((byte[]) userData.getContent());
            }

            else {
                signatureInfo.setSignedDoc(signedData);
            }


            SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");

            String signedOnDate = "null";
            String validFromDate = "null";
            String validToDate = "null";

            Date signedOn = this.getSignatureDate(signer);
            Date validFrom = cert.getNotBefore();
            Date validTo = cert.getNotAfter();

            if(signedOn != null) {
                signedOnDate = sdf.format(signedOn);
            }
            if(validFrom != null) {
                validFromDate = sdf.format(validFrom);
            }
            if(validTo != null) {
                validToDate = sdf.format(validTo);
            }

            DefaultAlgorithmNameFinder algNameFinder = new DefaultAlgorithmNameFinder();

            signatureInfo.setSignedBy(IETFUtils.valueToString(cert.getSubject().getRDNs(BCStyle.CN)[0].getFirst().getValue()));
            signatureInfo.setSignedOn(signedOn);
            signatureInfo.setIssuer(IETFUtils.valueToString(cert.getIssuer().getRDNs(BCStyle.CN)[0].getFirst().getValue()));
            signatureInfo.setValidFrom(validFrom);
            signatureInfo.setValidTo(validTo);
            signatureInfo.setVersion(String.valueOf(cert.getVersion()));
            signatureInfo.setSignatureAlg(algNameFinder.getAlgorithmName(signer.getDigestAlgorithmID()) + " WTIH " + algNameFinder.getAlgorithmName(cert.getSubjectPublicKeyInfo().getAlgorithmId()));

            /*signatureInfo.put("Signed by", IETFUtils.valueToString(cert.getSubject().getRDNs(BCStyle.CN)[0].getFirst().getValue()));
            signatureInfo.put("Signed on", signedOnDate);
            signatureInfo.put("Issuer", IETFUtils.valueToString(cert.getIssuer().getRDNs(BCStyle.CN)[0].getFirst().getValue()));
            signatureInfo.put("Valid from", validFromDate);
            signatureInfo.put("Valid to", validToDate);
            signatureInfo.put("Version", "V" + String.valueOf(cert.getVersion()));
            signatureInfo.put("Signature algorithm", algNameFinder.getAlgorithmName(signer.getDigestAlgorithmID()) + " WTIH " + algNameFinder.getAlgorithmName(cert.getSubjectPublicKeyInfo().getAlgorithmId()));*/

            break;
        }
    }

    return signatureInfo;

}

Your question is somewhat confusing to me. getSignedContent() returns the PDF without the signature content string. This isn't a real PDF. See the ShowSignature.java example from the source code download on how to verify a signature. If this doesn't help, please edit your question. — Tilman Hausherr
@user3362334 could you document where PDFUtils() comes from? It's not in apache pdfbox-2.0.18.... — user2677034
@user2677034, could you please share the maven repo for PDFUtils & SignatureInfo ? — User
Thnx alot @user2677034 , I already added maven for the bouncyCastle but gives compilation error. Any clue ? — User

mkl mkl · Accepted Answer · 2017-06-30T13:42:52

You appear to have a misconception concerning the getSignedContent method in particular and PDF signing in general.

I'm using Apache PDFBox 2.0.6 to get the signature and the original PDF that was signed

If by "the original PDF that was signed" you mean a PDF before it entered the signing process, then the second part of your task is impossible for generic signed PDFs.

The reason is that the original PDF before creation of the actual signature is prepared for the act of signing.

This preparation might mean as little as adding a value dictionary (including a gap for later injection of the signature container) for a pre-existing empty signature field as an incremental update leaving the original PDF an untouched starting piece of the resulting signed document.

On the other hand, though, it may additionally mean that a number of the following changes also occur:

a new signature field may be created from scratch;
an additional page may be added to the document for signature visualizations;
extra signature visualizations (either inactive images or actual signature form field widgets) may be added to each page;
missing appearances for form fields may be created;
the signing application may add its name to meta data entries as document processor, date and time of last change may be updated to the signing time;
in case of a pre-existing empty signature field, form fields indicated by that field's field lock dictionary may be set read only;
etc pp

If the document was not signed before, these additions need not be added as incremental updates, instead all the objects (changed or unchanged) may be re-ordered, renumbered, indirect object may become direct ones and vice versa, unused objects might be dropped, duplicate objects might be reduced to a single one, fonts of form fields made read-only may be reduced to the actually used glyphs, etc pp

Only for this prepared PDF the actual signature is created and embedded in the gap left in the signature value dictionary.

If you apply your calls

byte[] origPDF = doc.getSignatureDictionaries().get(0).getSignedContent(signedPDF);
byte[] signature = doc.getSignatureDictionaries().get(0).getContents(signedPDF);

to the signed document, origPDF contains the bytes of the signed document except the gap in the signature value dictionary and signature contains the (hex decoded) contents of the gap.

So origPDF in particular contains all the changes done during the preparation; calling it orig, therefore, is vehemently misleading.

Furthermore, as the gap originally reserved for the signature container is missing, it is very likely that these bytes actually don't form a valid PDF anymore: PDFs contain cross references which point to the starting offsets (from the start of the document) of each PDF object; as the gap is missing, the bytes after its former position have moved and offsets going there now are wrong.

Thus, your origPDF merely contains the ensemble of signed bytes which may be very different from the file you consider the original one.

Your verifySig completely ignores the SubFilter of the signature field value dictionary. Depending on that value, the signature bytes you retrieve using getContents might have entirely different contents.

So without your signed PDF, further review of that method does not make sense.

Verifying PDF Signature in Java using Bouncy Castle and PDFBox

2 Answers