The Word interop is insanely slow when I try to parse the text in the document with 100+ pages. I re-wrote my code to use the OpenXML SDK which is much faster. My problem is that once I have found the information in OpenXML document I have to locate it then in the Word document and scroll main window to it. In order to accomplish this I have to somehow match OpenXML paragraph to interop paragraph. I thought that interop paragraphs perfectly match openxml paragraphs, but I was wrong. In fact the interop usually have more paragraphs than in OpenXML. Is there any trick or some kind of information which could help me match them? For example I have figured out that usually interop has 1 more empty paragraph after every row in the table. So I could probably use this information and bear it in mind, however I afraid there much more than just 1 case I have found myself.
UPDATE
Here is below screenshots of simple Add-In I have created to demonstrate the difference between interop and openxml paragraphs on the Word document with simple content like this:
The add-in then retrieves the list of interop paragraphs and list of OpenXML paragraphs and show them side-by-side:
Here is below the code I used:
var document = Globals.ThisAddIn.Application.ActiveDocument;
if (document == null)
return;
var interopParagraphs = document
.StoryRanges
.Cast<Range>()
.SingleOrDefault(r => r.StoryType == WdStoryType.wdMainTextStory)
.Paragraphs
.Cast<Paragraph>()
.Select(p => p.Range.Text);
var openXmlDocument = WordprocessingDocument.FromFlatOpcString(document.Content.WordOpenXML);
if (openXmlDocument == null)
return;
var openXmlParagraphs = openXmlDocument
.MainDocumentPart
.Document
.Body
.Descendants<DocumentFormat.OpenXml.Wordprocessing.Paragraph>()
.Select(p => p.InnerText);
var compareDialog = new CompareForm(interopParagraphs, openXmlParagraphs);
compareDialog.ShowDialog();
selection.Find
on the first 255 characters of a paragraph? – Cindy MeisterRange.IsEndOfRowMark
orRange.Information[WdInformation.wdAtEndOfRowMarker]
(don't forget to collapse the range first or it won't work) to ignore them, but I don't know what other cases might exist. Also it doesn't work if you're just trying to match indexes without iteration. – Chris