5
votes

For a Paragraph object, how can I determine on which page this is located using the Open XML SDK 2.0 for Microsoft Office ?

3

3 Answers

6
votes

It is not possible to get page numbers for a word document using OpanXml Sdk as this is handled by the client (like MS Word).

However if the document you are working with is previously opened by a word client and saved back, then the client will add LastRenderedPageBreak to identify the page breaks. Refer to my answer here for more info about LastRenderedPageBreaks. This enables you to count for the number of LastRenderedPageBreak elements before your paragraph to get the current page count.

If this is not the case then the noddy option to work around your requirement is to add footers with page numbers (may be with same colour as your documents to virtually hide it!). Only an option - if you are automating the word document generation using OpenXML sdk.

2
votes

@Flowerking : thanks for the information.

Because I need to loop all the paragraphs anyway to search for a certain string, I can use the following code to find the page number:

using (var document = WordprocessingDocument.Open(@"c:\test.docx", false))
{
    var paragraphInfos = new List<ParagraphInfo>();

    var paragraphs = document.MainDocumentPart.Document.Descendants<Paragraph>();

    int pageIdx = 1;
    foreach (var paragraph in paragraphs)
    {
        var run = paragraph.GetFirstChild<Run>();

        if (run != null)
        {
            var lastRenderedPageBreak = run.GetFirstChild<LastRenderedPageBreak>();
            var pageBreak = run.GetFirstChild<Break>();
            if (lastRenderedPageBreak != null || pageBreak != null)
            {
                pageIdx++;
            }
        }

        var info = new ParagraphInfo
        {
            Paragraph = paragraph,
            PageNumber = pageIdx
        };

        paragraphInfos.Add(info);
    }

    foreach (var info in paragraphInfos)
    {
        Console.WriteLine("Page {0}/{1} : '{2}'", info.PageNumber, pageIdx, info.Paragraph.InnerText);
    }
}
0
votes

Here's an extension method I made for that :

    public static int GetPageNumber(this OpenXmlElement elem, OpenXmlElement root)
    {
        int pageNbr = 1;
        var tmpElem = elem;
        while (tmpElem != root)
        {
            var sibling = tmpElem.PreviousSibling();
            while (sibling != null)
            {
                pageNbr += sibling.Descendants<LastRenderedPageBreak>().Count();
                sibling = sibling.PreviousSibling();
            }
            tmpElem = tmpElem.Parent;
        }
        return pageNbr;
    }