6
votes

I've been reading through the adobe pdf spec, along with apple's quartz 2d documentation for pdf rendering and parsing. I've also downloaded Voyeur and inspected a local pdf with it to see it's internal data. At this point I'm able to get the document catalog, and then fetch the outlines dictionary from there. I can see that nested within the outlines dictionary dictionaries that there are named "/Dest" nodes with values such as:

G1.1025588 etc

I'm wondering if there is a way for me to use these values to get a reference to page to render using some methods I've seen github projects such as Reader, along with apple documented examples.

PDF processing is definitely a challenge, so any help would be appreciated.

2

2 Answers

4
votes

The /Dest entry in an outline item dictionary can either be a name, a string, or an array.

  • The simplest case is if it's an array; then the first item is the page object the outline entry points to (a dictionary). To get the page number, you have to iterate over all pages in the document and see which one is equal (==) to the dictionary you have (CGPDFPageRefs are actually CGPDFDictionaryRefs). You could also traverse the page tree, which is a bit harder, but may be faster (not as much as you might expect, I wouldn't optimize prematurely here). The other items in the array are position on the page etc., search for "Explicit Destinations" in the PDF spec to learn more.

  • If the entry is a name or string, it is a named destination. You have to map the name to a destination from the document catalog's /Dests entry which is a dictionary that contains a name tree. A name tree is essentially a tree map that allows fast access to named values without requiring to read all the data at once (as with a plain dictionary). Unfortunately, there's no direct support for name trees in Quartz, so you'll have to do a little more work to parse this structure recursively (see "Name Trees" in the PDF spec).

Note that an outline item doesn't necessarily have a /Dest entry, it can also specify its destination via an /A (action) entry, which is a little bit more complex. In most cases, however, the action will be a "GoTo" action that is essentially a wrapper for a destination.

The mapping of names to destinations can also be stored as a plain dictionary. In that case, it's in the /Dests entry of the /Names dictionary in the document's catalog. I've rarely seen this though and it was deprecated after PDF 1.2 (current is 1.7).

You will definitely need the PDF spec for this: http://www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf

0
votes

Thanks to Omz, here is a piece of code to retreive a page number for an outline destination in a PDF file :

// Get Page Number from an array
- (int) getPageNumberFromArray:(CGPDFArrayRef)array ofPdfDoc:(CGPDFDocumentRef)pdfDoc withNumberOfPages:(int)numberOfPages
{
    int pageNumber = -1;

    // Page number reference is the first element of array (el 0)
    CGPDFDictionaryRef pageDic;
    CGPDFArrayGetDictionary(array, 0, &pageDic);

    // page searching
    for (int p=1; p<=numberOfPages; p++)
    {
        CGPDFPageRef page = CGPDFDocumentGetPage(pdfDoc, p);
        if (CGPDFPageGetDictionary(page) == pageDic)
        {
            pageNumber = p;
            break;
        }
    }

    return pageNumber;
}

// Get page number from an outline. Only support "Dest" and "A" entries
- (int) getPageNumber:(CGPDFDictionaryRef)node ofPdfDoc:(CGPDFDocumentRef)pdfDoc withNumberOfPages:(int)numberOfPages
{
    int pageNumber = -1;

    CGPDFArrayRef destArray;
    CGPDFDictionaryRef dicoActions;
    if(CGPDFDictionaryGetArray(node, "Dest", &destArray))
    {
        pageNumber = [self getPageNumberFromArray:destArray ofPdfDoc:pdfDoc withNumberOfPages:numberOfPages];
    }
    else if(CGPDFDictionaryGetDictionary(node, "A", &dicoActions))
    {
        const char * typeOfActionConstChar;
        CGPDFDictionaryGetName(dicoActions, "S", &typeOfActionConstChar);

        NSString * typeOfAction = [NSString stringWithUTF8String:typeOfActionConstChar];
        if([typeOfAction isEqualToString:@"GoTo"]) // only support "GoTo" entry. See PDF spec p653
        {
            CGPDFArrayRef dArray;
            if(CGPDFDictionaryGetArray(dicoActions, "D", &dArray)) 
            {
                pageNumber = [self getPageNumberFromArray:dArray ofPdfDoc:pdfDoc withNumberOfPages:numberOfPages];
            }
        }
    }

    return pageNumber;
}