2
votes

I have some word templates(dot/dotx) files that contain xml tags along with plain text.
At run time, I need to replace the xml tags with their respective mail merge fields.

So, need to parse the document for these xml tags and replace them with merge fields. I was using Regex to find and replace these xml tags. But I was suggested to use XML parser to parse for XML tags (Regex for string enclosed in <*>, C#)

Now that I have presented my case better,
could you please guide if XML parser will be a right tool to achive above?
if yes, do I need to save the word document as xml file and then need to parse for xml tags?

Please guide.

4
Are you actually embedding custom XML into the Word templates? Or are you just typing angle brackets into the text? There is a difference. An XML parser wouldn't help if you're just typing angle brackets into the text of a Word document.CoderDennis
Thanks Dennis, well... it's the end users who will be creating the templates by entering these XML tags by manually typing in angle brackets.inutan
Inutan, you just asked this same question 3 times, in 3 different threads. That isn't necessary. Stick to one question.Cheeso
If you're really engineering a process where users will be hand-coding its input XML in Microsoft Word, you might as well just shoot yourself in the head now.Robert Rossney

4 Answers

2
votes

You need to use the Word APIs. This is more complicated than you think.

Word 2003 files (.doc, dot) are stored in a proprietary, binary format. Reading this format by reading the specification is near impossible, and it's well worth it to invest in an SDK for this, or to connect directly to Word through COM to handle the processing.

Word 2007 files (.docx, .dotx) are indeed in XML, but a .docx file is actually a zipped heirarchy of folders and files creating the document in pieces. For this, the OpenXML SDK can handle .docx, and I assume can also handle their equivalent templates.

An alternative for the 2007 format is to create your template using Word, and learn the heirarchy of files and handle them appropriately. Change the .docx or .dotx extension to .zip, unzip, and find where your find-and-replace tags are located. You may be able to just replace the tags, rezip the heirarchy and rename the extension.

1
votes

Why don't you use the Word APIs to do this? I can't imagine any way to do this safely without using the APIs that were designed for the purpose.

0
votes

Yes, you can to use System.Xml.XmlDocument class to read your XML source. You'll also need to declare all namespaces required to deal with that XML content.

0
votes

First of all, I think Regex should be just fine.

But if you really want to use an XML parser I love XmlDocument/XmlNode in .NET. The two functions SelectSingleNode and SelectNodes are infinitely useful. Unfortunately, I do not have a Word XML example in front of me, so let's assume this XML:

<Document>
  <MergeField name="phone"></MergeField>
  <MergeField name="email"></MergeField>
</Document>

You would then use code as follows:

XmlDocument wordDoc = new XmlDocument();
wordDoc.Load(fileName);

XmlNodeList mergeNodes = wordDoc.SelectNodes("//MergeField");

foreach(XmlNode mergeNode in mergeNodes)
{
   string fieldName = mergeNode.Attributes["name"].Value;
   // Do something here based on field name
   // e.g.:

   mergeNode.InnerText = GetFieldValue(fielName);
}

doc.Save(fileName);

The tricky part is that Word XML uses XML namespaces all over the place, so you need to use the XmlNamespaceManager class is .NET to tell the XML document which namespace is which, so it would be more like:

XmlDocument wordDoc = new XmlDocument();
wordDoc.Load(fileName);

XmlNamespaceManager nsm = new XmlNamespaceManager(doc.NameTable);
nsm.AddNamespace("o", "http://somenamepaceurl.com");
XmlNodeList mergeNodes = wordDoc.SelectNodes("//o:MergeField", nsm);

foreach(XmlNode mergeNode in mergeNodes)
{
   string fieldName = mergeNode.Attributes["name"].Value;
   // Do something here based on field name
   // e.g.:

   mergeNode.InnerText = GetFieldValue(fielName);
}

doc.Save(fileName);