1
votes

The Word interop is insanely slow when I try to parse the text in the document with 100+ pages. I re-wrote my code to use the OpenXML SDK which is much faster. My problem is that once I have found the information in OpenXML document I have to locate it then in the Word document and scroll main window to it. In order to accomplish this I have to somehow match OpenXML paragraph to interop paragraph. I thought that interop paragraphs perfectly match openxml paragraphs, but I was wrong. In fact the interop usually have more paragraphs than in OpenXML. Is there any trick or some kind of information which could help me match them? For example I have figured out that usually interop has 1 more empty paragraph after every row in the table. So I could probably use this information and bear it in mind, however I afraid there much more than just 1 case I have found myself.

UPDATE

Here is below screenshots of simple Add-In I have created to demonstrate the difference between interop and openxml paragraphs on the Word document with simple content like this:

MS Word Document Sample The add-in then retrieves the list of interop paragraphs and list of OpenXML paragraphs and show them side-by-side:

Side-by-side comparison

Here is below the code I used:

var document = Globals.ThisAddIn.Application.ActiveDocument;

if (document == null)
    return;

var interopParagraphs = document
    .StoryRanges
    .Cast<Range>()
    .SingleOrDefault(r => r.StoryType == WdStoryType.wdMainTextStory)
    .Paragraphs
    .Cast<Paragraph>()
    .Select(p => p.Range.Text);

var openXmlDocument = WordprocessingDocument.FromFlatOpcString(document.Content.WordOpenXML);

if (openXmlDocument == null)
    return;

var openXmlParagraphs = openXmlDocument
    .MainDocumentPart
    .Document
    .Body
    .Descendants<DocumentFormat.OpenXml.Wordprocessing.Paragraph>()
    .Select(p => p.InnerText);

var compareDialog = new CompareForm(interopParagraphs, openXmlParagraphs);
compareDialog.ShowDialog();
1
Have you tried using selection.Find on the first 255 characters of a paragraph?Cindy Meister
I just tried making a document in Word 2013 with a 2x2 table and a single paragraph right after it. There were no extra paragraphs. Tried again with more rows, same thing. Can you give us some example OpenXML?Chris
@Chris I have updated my question with more details.Alexey Andrushkevich
@AlexeyAndrushkevich You're right, there are extras, I was looking at the OpenXML, not interop. In this specific case, if you're iterating through them, you can use Range.IsEndOfRowMark or Range.Information[WdInformation.wdAtEndOfRowMarker] (don't forget to collapse the range first or it won't work) to ignore them, but I don't know what other cases might exist. Also it doesn't work if you're just trying to match indexes without iteration.Chris

1 Answers

0
votes

Turning my comment into an answer.


For the case of table rows, you can check to see whether you are looking at an end-of-row paragraph using Range.IsEndOfRowMark.

This property returns True if the specified range is collapsed and is located at the end-of-row mark in a table, and False if not.

You can also use Range.Information[WdInformation.wdAtEndOfRowMarker].

Returns True if the specified selection or range is at the end-of-row mark in a table

Despite the slight difference in the documentation, the range must be collapsed for this property as well. AFAIK, they are equivalent.

I also noticed that this doesn't work if you access a paragraph directly, e.g.e Document.Paragraph[4]. You have to iterate through them for it to work. This does not seem to be documented.