1
votes

Is there any way when programming MS Word to list the points in the text where a change in character style occurs?

I'm programmatically trying to analyze a paragraph to retrieve all the contiguous blocks of text that have the same style - in other words, split the paragraph at the points where the text style changes. At the moment the way I'm doing it is to take each character and compare its style with the previous character - if the name of the style is different, I know I've found a point to split the results at. That works but is horrendously inefficient (for every character, you have to do a full string comparison of the style name). I'm wondering if there's a way in the Word object model to solve this problem without comparing every character?

The approximate code I'm currently using is as follows (It's C# code: I'm using COM Interop against Word 2003, but I'd be equally happy with a solution in VBA since once I know in principle how to do it, converting to C# should be easy. )

// used to store the results as we go
StringBuilder currentText = new StringBuilder();
string currentStyle = null;

// range contains the Range I want to split up
foreach (Range charRng in range.Characters)
{
    string style = charRng.get_Style().NameLocal;
    if (style == currentStyle)
    {
        currentText.Append(charRng.Text);
    }
    else
    {
              AddTextBlockToMyResults(currentStyle, currentText.ToString());
        currentText = new StringBuilder(charRng.Text);
        currentStyle = style;
    }
}
AddTextBlockToMyResults(currentStyle, currentText.ToString());
1

1 Answers

2
votes

What version(s) of Office were used to create the Word docs?

If it's Office 2007 or later (or, you can convert the docs to that format) then an office document is really just a .zip archive. If you open a .docx file with an archive utility like WinRAR, you'll see that it has a directory structure like:

_rels
customXml
docProps
word
|_ document.xml

That document.xml is an Open Office XML file that contains all the text and reference to styles in your Word doc. I bet you could parse that XML a heck of a lot faster than doing what you're doing now.