Word OpenXML replace token text

Question

I'm using OpenXML to amend Word templates, these templates contain simple tokens that are identifiable by certain characters (currently the double chevrons (ascii 171 and 187)).

I would like to replace these tokens with my text, which could be multiline (i.e. from a database).

cjb110 cjb110 · Accepted Answer · 2013-07-29T08:49:19

Firstly you need to open the template:

        //read file into memory
        byte[] docByteArray = File.ReadAllBytes(templateName);
        using (MemoryStream ms = new MemoryStream())
        {
            //write file to memory stream
            ms.Write(docByteArray, 0, docByteArray.Length);

            //
            ReplaceText(ms);

            //reset stream
            ms.Seek(0L, SeekOrigin.Begin);

            //save output
            using (FileStream outputStream = File.Create(docName))
                ms.CopyTo(outputStream);
        }

The simple approach searching the inner text xml of the body is the quickest way, but doesn't allow for insertion of multiline text and doesn't give you the basis to expand to more complicated changes.

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(ms, true))
{
     string docText = null;
     //read the entire document into a text
     using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
         docText = sr.ReadToEnd();

     //replace the text
     docText.Replace(oldString, myNewString);

     //write the text back
     using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
         sw.Write(docText);
}

Instead you need to work with the elements and structure:

        using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(ms, true))
        {
            //get all the text elements
            IEnumerable<Text> texts = wordDoc.MainDocumentPart.Document.Body.Descendants<Text>();
            //filter them to the ones that contain the QuoteLeft char
            var tokenTexts = texts.Where(t => t.Text.Contains(oldString));

            foreach (var token in tokenTexts)
            {
                //get the parent element
                var parent = token.Parent;
                //deep clone this Text element
                var newToken = token.CloneNode(true);

                //split the text into an array using a regex of all line terminators
                var lines = Regex.Split(myNewString, "\r\n|\r|\n");

                //change the original text element to the first line
                ((Text) newToken).Text = lines[0];
                //if more than one line
                for (int i = 1; i < lines.Length; i++)
                {
                    //append a break to the parent
                    parent.AppendChild<Break>(new Break());
                    //then append the next line
                    parent.AppendChild<Text>(new Text(lines[i]));
                }

                //insert it after the token element
                token.InsertAfterSelf(newToken);
                //remove the token element
                token.Remove();
            }

            wordDoc.MainDocumentPart.Document.Save();
        }

Basically you find the Text element (Word is built from Paragraphs of Runs of Text), clone it, change it (inserting new Break and Text elements if needed), then add it after the original token Text element and finally remove the original token Text element.

Word OpenXML replace token text

1 Answers