6
votes

I am a newbie in using Digital Signatures. In one of the projects we are using Apache PdfBox for processing digitally signed pdf files. While we could test all features, verification of signed pdf files is something we are unable to crack. We are using BouncyCastle as the provider. Below is the code:

Get Digital Signature and Signed Content from pdf file:

byte[] signatureAsBytes = pdsignature.getContents(new FileInputStream(this.INPUT_FILE));
byte[] signedContentAsBytes = pdsignature.getSignedContent(new FileInputStream(this.INPUT_FILE));

Digital Signature Verification:

Security.addProvider(new BouncyCastleProvider());
Signature signer = Signature.getInstance("RSA","BC");

//Get PublicKey from p7b file
X509Certificate cert509=null;
File file = new File("C:\\certificate_file.p7b");
FileInputStream fis = new FileInputStream(file);
CertificateFactory cf = CertificateFactory.getInstance("X.509");
Collection c = cf.generateCertificates(fis);
Iterator it = c.iterator();
PublicKey pubkey;

while (it.hasNext()) 
{
   cert509 = (X509Certificate) it.next();
   pubkey = cert509.getPublicKey();
}

boolean VERIFIED=false;
Security.addProvider(new BouncyCastleProvider());
Signature signer = Signature.getInstance("RSA","BC");
PublicKey key=this.getPublicKey(false);
signer.initVerify(key);

List<PDSignature> allsigs = this.PDFDOC.getSignatureDictionaries();
Iterator<PDSignature> i = allsigs.iterator();
    
while(i.hasNext())
{
        PDSignature sig = (PDSignature) i.next();
        byte[] signatureAsBytes = sig.getContents(new FileInputStream(this.INPUT_FILE));
        byte[] signedContentAsBytes = sig.getSignedContent(new FileInputStream(this.INPUT_FILE));
        signer.update(signedContentAsBytes);
        VERIFIED=signer.verify(signatureAsBytes);
}
    
System.out.println("Verified="+VERIFIED);

Below are relevant extracts from the certificate in p7b format - I am using BouncyCastle as security provider:

  Signature Algorithm: SHA256withRSA, OID = 1.2.840.113549.1.1.11
  Key:  Sun RSA public key, 2048 bits
  Validity: [From: Tue Aug 06 12:26:47 IST 2013,
  To: Wed Aug 05 12:26:47 IST 2015]
  Algorithm: [SHA256withRSA]

With the above code I am always getting response as "false". I have no idea how to fix the issue. Please help

2
As I understand, a copy of "unsigned" version of pdf file will be necessary for comparison. How can I extract the original copy of pdf file from signed copy? Any help will be appreciatedRanjan
How can I extract the original copy of pdf file from signed copy? - that is what 'pdsignature.getSignedContent' is for.mkl
pdsignature.getSignedContent returns original content - that is my understanding too. However, verification never returns true. As I have said earlier i am new to digital signature therefore unable to fix the issue.Ranjan
A short look at your code shows that it completely ignores the type of integrated PDF signature, adbe.x509.rsa_sha1, adbe.pkcs7.detached, adbe.pkcs7.sha1, ETSI.CAdES, and ETSI.RFC3161. Do you have to check only one kind of them (if yes, which one)?mkl
Please advise which is the correct choice. Input file is a signed pdf and algorithm used is SHA256withRSA.Ranjan

2 Answers

9
votes

Your prime problem is that there are multiple types of PDF signatures differing in the format of the signature container and in what actually are the signed bytes. Your BC code, on the other hand, can verify merely naked signature byte sequences which are contained in the afore-mentioned signature containers.

Interoperable signature types

As the header already says, the following list contains "interoperable signature types" which are more or less strictly defined. The PDF specification specifies a way to also include completely custom signing schemes. But let us assume we are in an interoperable situation. The the collection of signature types burns down to:

  • adbe.x509.rsa_sha1 defined in ISO 32000-1 section 12.8.3.2 PKCS#1 Signatures; the signature value Contents contain a DER-encoded PKCS#1 binary data object; this data object is a fairly naked signature, in case of RSA an encrypted structure containing the padded document hash and the hash algorithm.

  • adbe.pkcs7.sha1 defined in ISO 32000-1 section 12.8.3.3 PKCS#7 Signatures; the signature value Contents contain a DER-encoded PKCS#7 binary data object; this data object is a big container object which can also contain meta-information, e.g. it may contain certificates for building certificate chains, revocation information for certificate revocation checks, digital time stamps to fix the signing time, ... The SHA1 digest of the document’s byte range shall be encapsulated in the PKCS#7 SignedData field with ContentInfo of type Data. The digest of that SignedData shall be incorporated as the normal PKCS#7 digest.

  • adbe.pkcs7.detached defined in ISO 32000-1 section 12.8.3.3 PKCS#7 Signatures; the signature value Contents contain a DER-encoded PKCS#7 binary data object, see above. The original signed message digest over the document’s byte range shall be incorporated as the normal PKCS#7 SignedData field. No data shall be encapsulated in the PKCS#7 SignedData field.

  • ETSI.CAdES.detached defined in ETSI TS 102 778-3 and will become integrated in ISO 32000-2; the signature value Contents contain a DER-encoded SignedData object as specified in CMS; CMS signature containers are close relatives to PKCS#7 signature containers, see above. This essentially is a differently profiled and stricter defined variant of adbe.pkcs7.detached.

  • ETSI.RFC3161 defined in ETSI TS 102 778-4 and will become integrated in ISO 32000-2; the signature value Contents contain a TimeStampToken as specified in RFC 3161; time stamp tokens again are a close relative to PKCS#7 signature containers, see above, but they contain a special data sub-structure harboring the document hash, the time of the stamp creation, and information on the issuing time server.

I would propose studying the specifications I named and the documents referenced from there, mostly RFCs. Based on that knowledge you can easily find the appropriate BouncyCastle classes to analyze the different signature Contents.

PS (2021): Meanwhile ISO 32000-2 has been published and indeed contains specifications of ETSI.CAdES.detached and ETSI.RFC3161. Also the ETSI technical specifications TS 102 778-* for PAdES have been replaced by an actual norm, ETSI EN 319 142-*.

5
votes

A working example to validate adbe.pkcs7.detached PDF signatures (the most common PDF signatures) with Apache PDFBox 1.8.16 and Bouncy Castle 1.44:

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.interactive.digitalsignature.PDSignature;
import org.bouncycastle.cms.CMSProcessable;
import org.bouncycastle.cms.CMSProcessableByteArray;
import org.bouncycastle.cms.CMSSignedData;
import org.bouncycastle.cms.SignerInformation;
import org.bouncycastle.cms.SignerInformationStore;

import java.io.File;
import java.io.FileInputStream;
import java.security.cert.CertStore;
import java.security.cert.X509Certificate;
import java.util.Collection;
import java.util.Iterator;
import java.util.List;

public class PDFBoxValidateSignature {
    public static void main(String[] args) throws Exception {
        File signedFile = new File("sample-signed.pdf");
        // We load the signed document.
        PDDocument document = PDDocument.load(signedFile);
        List<PDSignature> signatureDictionaries = document.getSignatureDictionaries();
        // Then we validate signatures one at the time.
        for (PDSignature signatureDictionary : signatureDictionaries) {
            // NOTE that this code currently supports only "adbe.pkcs7.detached", the most common signature /SubFilter anyway.
            byte[] signatureContent = signatureDictionary.getContents(new FileInputStream(signedFile));
            byte[] signedContent = signatureDictionary.getSignedContent(new FileInputStream(signedFile));
            // Now we construct a PKCS #7 or CMS.
            CMSProcessable cmsProcessableInputStream = new CMSProcessableByteArray(signedContent);
            CMSSignedData cmsSignedData = new CMSSignedData(cmsProcessableInputStream, signatureContent);
            SignerInformationStore signerInformationStore = cmsSignedData.getSignerInfos();
            Collection signers = signerInformationStore.getSigners();
            CertStore certs = cmsSignedData.getCertificatesAndCRLs("Collection", (String) null);
            Iterator signersIterator = signers.iterator();
            while (signersIterator.hasNext()) {
                SignerInformation signerInformation = (SignerInformation) signersIterator.next();
                Collection certificates = certs.getCertificates(signerInformation.getSID());
                Iterator certIt = certificates.iterator();
                X509Certificate signerCertificate = (X509Certificate) certIt.next();
                // And here we validate the document signature.
                if (signerInformation.verify(signerCertificate.getPublicKey(), (String) null)) {
                    System.out.println("PDF signature verification is correct.");
                    // IMPORTANT: Note that you should usually validate the signing certificate in this phase, e.g. trust, validity, revocation, etc. See http://www.nakov.com/blog/2009/12/01/x509-certificate-validation-in-java-build-and-verify-chain-and-verify-clr-with-bouncy-castle/.
                } else {
                    System.out.println("PDF signature verification failed.");
                }
            }
        }
    }
}

Not sure if there is an official example for this, I've checked in the official examples for PDFBox 1.8.4 and I didn't find anything.