2
votes

I'm using iTextSharp, in a C# app that reads PDF files and breaks out the pages as separate PDF documents. It works well, except in the case of portfolios. Now I'm trying to figure out how to read a PDF portfolio (or Collection, as they seem to be called in iText) that contains two embedded PDF documents. I want to simply open the portfolio, enumerate the embedded files and then save them as separate, simple PDF files.

There's a good example of how to programmatically create a PDF portfolio, here: Kubrick Collection Example

But I haven't seen any examples that read portfolios. Any help would be much appreciated!

1

1 Answers

3
votes

The example you referenced adds the embedded files as document-level attachments. So you can extract the files like this:

PdfReader reader = new PdfReader(readerPath);
PdfDictionary root = reader.Catalog;
PdfDictionary documentnames = root.GetAsDict(PdfName.NAMES);
PdfDictionary embeddedfiles = 
    documentnames.GetAsDict(PdfName.EMBEDDEDFILES);
PdfArray filespecs = embeddedfiles.GetAsArray(PdfName.NAMES);
for (int i = 0; i < filespecs.Size; ) {
  filespecs.GetAsString(i++);
  PdfDictionary filespec = filespecs.GetAsDict(i++);
  PdfDictionary refs = filespec.GetAsDict(PdfName.EF);
  foreach (PdfName key in refs.Keys) {
    PRStream stream = (PRStream) PdfReader.GetPdfObject(
      refs.GetAsIndirectObject(key)
    );

    using (FileStream fs = new FileStream(
      filespec.GetAsString(key).ToString(), FileMode.OpenOrCreate
    )){
      byte[] attachment = PdfReader.GetStreamBytes(stream);
      fs.Write(attachment, 0, attachment.Length);
    }
  }
} 

Pass the output file from the Kubrick Collection Example you referenced to the PdfReader constructor (readerPath) if you want to test this.

Hopefully I'll have time to update the C# examples this month from version 5.2.0.0 (the iTextSharp version is about three weeks behind the Java version right now).