Am using Word Interop adn C# to build a program at work and one of the features in it is getting a word count.
Now this can't be the Word word count as i need to emulate the word count of a CAT toool used at work.
One of the issues i found is that the CAT tool uses text formatting to split up words. This means that if i have the word 1st with st superscripted, word will count one word (as there is nothing separating the two) and the CAT tool counts 2 words as per the text format change.
Thing is the CAT tool keeps track of the format changes and that information breaks the word.
So, i could go word by word, character by character, and check all possibilities (font, bold, italic, etc) but that would be really slow working with multiple documents each with 1000s of words.
Does anyone know a better solution?