12
votes

I am using Word and OpenXml to provide mail merge functionality in a C# ASP.NET web application:

1) A document is uploaded with a number of pre-defined strings for substitution.

2) Using the OpenXML SDK 2.0 I open the Word document, get the mainDocumentPart as a string and perform the substitution using Regex.

3) I then create a new document using OpenXML, add a new mainDocumentPart and insert the string resulting from the substitution into this mainDocumentPart.

However, all formatting/styles etc. are lost in the new document.

I'm guessing I can copy and add the Style, Definitions, Comment parts etc.. individually to mimic the orginal document.

However is there a method using Open XML to duplicate a document allowing me to perform the substitutions on the new copy?

Thanks.

6
Why not File.Copy(docName, newName);?Kiwimanshare
Have a look at my answer below for an update on the options you have with the Open XML SDK since 2014/15.Thomas Barnekow

6 Answers

16
votes

This piece of code should copy all parts from an existing document to a new one.

using (var mainDoc = WordprocessingDocument.Open(@"c:\sourcedoc.docx", false))
using (var resultDoc = WordprocessingDocument.Create(@"c:\newdoc.docx",
  WordprocessingDocumentType.Document))
{
  // copy parts from source document to new document
  foreach (var part in mainDoc.Parts)
    resultDoc.AddPart(part.OpenXmlPart, part.RelationshipId);
  // perform replacements in resultDoc.MainDocumentPart
  // ...
}
7
votes

I second the use of Content Controls recommendation. Using them to mark up the areas of your document where you want to perform substitution is by far the easiest way to do it.

As for duplicating the document (and retaining the entire document contents, styles and all) it's relatively easy:

string documentURL = "full URL to your document";
byte[] docAsArray = File.ReadAllBytes(documentURL);

using (MemoryStream stream = new MemoryStream)
{
    stream.Write(docAsArray, 0, docAsArray.Length);    // THIS performs doc copy
    using (WordprocessingDocument doc = WordprocessingDocument.Open(stream, true))
    {
        // perform content control substitution here, making sure to call .Save()
        // on any documents Part's changed.
    }
    File.WriteAllBytes("full URL of your new doc to save, including .docx", stream.ToArray());
}

Actually finding the content controls is a piece of cake using LINQ. The following example finds all the Simple Text content controls (which are typed as SdtRun):

using (WordprocessingDocument doc = WordprocessingDocument.Open(stream, true))
{                    
    var mainDocument = doc.MainDocumentPart.Document;
    var contentControls = from sdt in mainDocument.Descendants<SdtRun>() select sdt;

    foreach (var cc in contentControls)
    {
        // drill down through the containment hierarchy to get to 
        // the contained <Text> object
        cc.SdtContentRun.GetFirstChild<Run>().GetFirstChild<Text>().Text = "my replacement string";
    }
}

The <Run> and <Text> elements may not already exist but creating them is a simple as:

cc.SdtContentRun.Append(new Run(new Text("my replacement string")));

Hope that helps someone. :D

2
votes

I have done some very similar things, but instead of using text substitution strings, I use Word Content Controls. I have documented some of the details in the following blog post, SharePoint and Open Xml. The technique is not specific to SharePoint. You could reuse the pattern in pure ASP.NET or other applications.

Also, I would STRONGLY encourage you to review Eric White's Blog for tips, tricks and techniques regarding Open Xml. Specifically, check out the in-memory manipulation of Open Xml post, and the Word content controls posts. I think you'll find these much more helpful in the long run.

Hope this helps.

2
votes

As an addenda to the above; what's perhaps more useful is finding content controls that have been tagged (using the word GUI). I recently wrote some software that populated document templates that contained content controls with tags attached. To find them is just an extension of the above LINQ query:

var mainDocument = doc.MainDocumentPart.Document;
var taggedContentControls = from sdt in mainDocument.Descendants<SdtElement>()
                            let sdtPr = sdt.GetFirstChild<SdtProperties>()
                            let tag = (sdtPr == null ? null : sdtPr.GetFirstChild<Tag>())
                            where (tag != null)
                            select new
                            {
                                SdtElem = sdt,
                                TagName = tag.GetAttribute("val", W).Value
                            };   

I got this code from elsewhere but cannot remember where at the moment; full credit goes to them.

The query just creates an IEnumerable of an anonymous type that contains the content control and its associated tag as properties. Handy!

2
votes

The original question was asked before a number of helpful features were added to the Open XML SDK. Nowadays, if you already have an opened WordprocessingDocument, you would simply clone the original document and perform whatever transformation on that clone.

// Say you have done this somewhere before you want to duplicate your document.
using WordprocessingDocument originalDoc = WordprocessingDocument.Open("original.docx", false);

// Then this is how you can clone the opened WordprocessingDocument.
using var newDoc = (WordprocessingDocument) originalDoc.Clone("copy.docx", true);

// Perform whatever transformation you want to do.
PerformTransformation(newDoc);

You can also clone on a Stream or Package. Overall, you have the following options:

OpenXmlPackage Clone()

OpenXmlPackage Clone(Stream stream)
OpenXmlPackage Clone(Stream stream, bool isEditable)
OpenXmlPackage Clone(Stream stream, bool isEditable, OpenSettings openSettings)

OpenXmlPackage Clone(string path)
OpenXmlPackage Clone(string path, bool isEditable)
OpenXmlPackage Clone(string path, bool isEditable, OpenSettings openSettings)

OpenXmlPackage Clone(Package package)
OpenXmlPackage Clone(Package package, OpenSettings openSettings)

Have a look at the Open XML SDK documentation for details on those methods.

Having said that, if you have not yet opened the WordprocessingDocument, there are at least faster ways to duplicate, or clone, the document. I've demonstrated this in my answer on the most efficient way to clone Office Open XML documents.

0
votes

When you look at an openxml document by changing the extension to zip and opening it you see that that word subfolder contains a _rels folder where all the relations are listed. These relations point to the parts you mentioned (style ...). Actually you need these parts because they contain the definition of the formatting. So not copying them will cause the new document to use the formatting defined in the normal.dot file and not the one defined in the original document. So I think you have to copy them.