0
votes

Trying to process a PDF file and split it using the bookmarks defined using PDFSharp and while I can get a list of bookmarks I can not figure out how to actually figure out what page number corresponds to the bookmark definition.

An example PDF file I am working with has three top level bookmarks defined, on pages 1, 5 and 6 and while I can see the bookmarks with the snippet below I couldn't figure out a way to map the bookmark to a page number.

Code:

using (PdfDocument document = PdfReader.Open("test.pdf", PdfDocumentOpenMode.Import))
{
    PdfDictionary outline = document.Internals.Catalog.Elements.GetDictionary("/Outlines");

    Console.WriteLine("Page count: " + document.PageCount);

    foreach(var page in document.Pages)
    {
        // any hierarchy info on the page itself? doesn't seem to have any.
        Console.WriteLine(page.ToString());

    }

    for (PdfDictionary child = outline.Elements.GetDictionary("/First"); child != null; child = child.Elements.GetDictionary("/Next"))
    {
        Console.WriteLine(child.Elements.GetString("/Title"));

        // FIXME: get page numbers?

    }

}

Output:

Page count: 9
<< /Contents [ 1019 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 3874 2667 ] /Parent 1 0 R /Resources 1018 0 R /Type /Page >>
<< /Contents [ 1022 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 3874 2667 ] /Parent 1 0 R /Resources 1021 0 R /Type /Page >>
<< /Contents [ 1025 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 3874 2667 ] /Parent 1 0 R /Resources 1024 0 R /Type /Page >>
<< /Contents [ 1028 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 3874 2667 ] /Parent 1 0 R /Resources 1027 0 R /Type /Page >>
<< /Contents [ 1032 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 842 595 ] /Parent 1 0 R /Resources 1031 0 R /Type /Page >>
<< /Annots [ 46 0 R 48 0 R 50 0 R 52 0 R 54 0 R 56 0 R 58 0 R 60 0 R 62 0 R 64 0 R 66 0 R 68 0 R 70 0 R 72 0 R 74 0 R ] /Contents [ 1043 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 1130 799 ] /Parent 1 0 R /Resources 1042 0 R /Type /Page >>
<< /Annots [ 82 0 R 84 0 R 86 0 R 88 0 R 90 0 R 92 0 R 94 0 R 96 0 R 98 0 R 100 0 R 102 0 R 104 0 R 106 0 R 108 0 R 110 0 R 112 0 R 114 0 R 116 0 R 118 0 R 120 0 R 122 0 R 124 0 R 126 0 R 128 0 R 130 0 R 132 0 R 134 0 R 136 0 R 138 0 R 140 0 R 142 0 R 144 0 R 146 0 R 148 0 R 150 0 R 152 0 R 154 0 R 156 0 R 158 0 R ] /Contents [ 1048 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 1130 799 ] /Parent 1 0 R /Resources 1047 0 R /Type /Page >>
<< /Annots [ 166 0 R 168 0 R 170 0 R 172 0 R 174 0 R 176 0 R 178 0 R 180 0 R 182 0 R ] /Contents [ 1053 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 1130 799 ] /Parent 1 0 R /Resources 1052 0 R /Type /Page >>
<< /Annots [ 190 0 R 192 0 R 194 0 R 196 0 R ] /Contents [ 1058 0 R ] /Group << /CS /DeviceRGB /S /Transparency >> /MediaBox [ 0 0 1130 799 ] /Parent 1 0 R /Resources 1057 0 R /Type /Page >>
Bookmark 1
Bookmark 2 
Bookmark 3

I am not necessarily married to PDFSharp library.

Any pointers? Thanks!

2

2 Answers

0
votes

The maintainers of PDFSharp kindly provided some insight into this issue here: http://forum.pdfsharp.net/viewtopic.php?f=2&t=3663

0
votes

I know this is an older question, but I wanted to help out as I recently had the same item happen to me. I was needing to split a document based on the bookmarks. I looked at "/Dest", but was unable to use. Turns out the key is setting child to a PdfOutline type.

This code takes root bookmark and moves said root + children into its own document preserving bookmarks. I know no one uses VB these days, but it should be easy enough to convert to C#.

A note: You will need to open the original document in import mode PdfSharp.Pdf.IO.PdfReader.Open(path, PdfSharp.Pdf.IO.PdfDocumentOpenMode.Import).

Dim rgx As New Regex("^[0-9]{16}$")
Dim docs As List(Of PdfDocument) = New List(Of PdfDocument)
Dim i As Integer = 0
Dim p As String = Application.UserAppDataPath + "\part{0}.pdf"
' Root Level Bookmarks
For Each rootb In document.Outlines
    ' Ensure Account Number Type
    If rgx.IsMatch(rootb.Title) Then
        Dim newdoc As PdfSharp.Pdf.PdfDocument = New PdfDocument
        Dim rp As PdfPage = newdoc.AddPage(rootb.DestinationPage)
        Dim outline As PdfOutline = newdoc.Outlines.Add(rootb.Elements.GetString("/Title"), rp, True, PdfOutlineStyle.Bold, XColors.Red)
        Dim child As PdfDictionary = rootb.Elements.GetDictionary("/First")
        While child IsNot Nothing
            Dim item As PdfOutline = child
            Dim cp As PdfPage = newdoc.AddPage(item.DestinationPage)
            outline.Outlines.Add(item.Elements.GetString("/Title"), cp, True)
            child = child.Elements.GetDictionary("/Next")
        End While

        newdoc.Save(String.Format(p, i))
        i += 1
Next