I have an application in C# which reads texts from a word (.docx) file using OpenXML.
In general, there is a set of Paragraphs (p) which contain Run elements (r). I can iterate over the Run nodes with
foreach ( var run in para.Descendants<Run>() )
{
...
}
In one specific document there is a text "START" which is split into three parts, "ST", "AR" and "T". Each of them is defined by a Run node, but in two cases, the Run node is contained within a "smartTag" node.
<w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="PersonName">
<w:r w:rsidRPr="00BF444F">
<w:rPr>
<w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
<w:b/>
<w:bCs/>
<w:sz w:val="40"/>
<w:szCs w:val="40"/>
</w:rPr>
<w:t>ST</w:t>
</w:r>
</w:smartTag>
<w:smartTag w:uri="urn:schemas-microsoft-com:office:smarttags" w:element="PersonName">
<w:r w:rsidRPr="00BF444F">
<w:rPr>
<w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
<w:b/>
<w:bCs/>
<w:sz w:val="40"/>
<w:szCs w:val="40"/>
</w:rPr>
<w:t>AR</w:t>
</w:r>
</w:smartTag>
<w:r w:rsidRPr="00BF444F">
<w:rPr>
<w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
<w:b/>
<w:bCs/>
<w:sz w:val="40"/>
<w:szCs w:val="40"/>
</w:rPr>
<w:t xml:space="preserve">T</w:t>
</w:r>
As far as I can tell, OpenXML does not support the smartTag node. As a result, it just generates OpenXmlUnknownElement nodes.
What makes this difficult, is that it generates OpenXmlUnknownElement nodes for all of the descendent nodes of the smartTag. This means that I cannot simply get the first child node and cast it to a Run.
Getting the text (via the InnerText property) is easy, but I also need to get the formatting information.
Is there any reasonably easy way to handle this?
At present, my best idea is to write a preprocessor which removes the smart tag nodes.
Edit
Following up on the comment from Cindy Meister.
I am using OpenXml version 2.7.2. As Cindy has pointed out, there is a class SmartTagRun, in OpenXML 2.0. I did not know about that class.
I have found the following information on the page What's new in the Open XML SDK 2.5 for Office
Smart tags
Because smart tags were deprecated in Office 2010, the Open XML SDK 2.5 doesn't support smart tag related Open XML elements. The Open XML SDK 2.5 still can process smart tag elements as unknown elements, however the Open XML SDK 2.5 Productivity Tool for Office validates those elements (see the following list) in Office document files as invalid tags.
So it sounds like a possible solution would be to use OpenXML 2.0.
para.Descendants<Run>
should also pick up runs in asmartTag
? You're saying the SDK is differentiating betweenw:r
nested inw:smartTag
? (I can't test because Word doesn't support creating SmartTags anymore - there was a court case that decided MS was using technology patented by another company so the capability had to stripped out.) – Cindy Meisterpara.Descendents<Run>
does not pick up the Runs in the smartTag. The smartTag and all descendent nodes are created as OpenXmlUnknownElement nodes. – Phil Jollans