2
votes

I'm trying to duplicate the docx file contents and save them within the same file using OpenXML in C#

Here is the code:

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(wordFileNamePath, true))
{
    foreach(OpenXmlElement element in wordDoc.MainDocumentPart.Document.ChildElements)
    {
        OpenXmlElement cloneElement = (OpenXmlElement)element.Clone();
        wordDoc.MainDocumentPart.Document.Append(cloneElement);
    }
    wordDoc.MainDocumentPart.Document.Save();
}

The code is working fine and does what I need. My problem is that the resulting docx file is partially corrupted. When I open my file I get the following two messages: enter image description here

Clicking on 'OK' then 'Yes' will open the file normally. However, the file keeps being corrupted until I 'save as' it (with the same or with a different name). That's how the new saved file becomes fixed.

By using the Open XML SDK 2.5 Productivity Tool for Microsoft Office, I can Validate the file and see the reflected code. Validating the file will give the following 5 errors:

enter image description here

So I think that "Clone" function that I use in my code copies the element as it is so when it is appended to the document, some IDs duplications occur.

Any idea to get a proper working DOCX file after duplicating itself? Any alternative code is appreciated.

2

2 Answers

1
votes

The problem with your method is that it creates invalid Open XML markup. Here is why.

Let's say you have a very simple Word document that is represented by the following markup:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p>
      <w:r>
        <w:t>First paragraph</w:t>
      </w:r>
    </w:p>
    <w:p>
      <w:r>
        <w:t>Second paragraph</w:t>
      </w:r>
    </w:p>
  <w:body>
<w:document>

In your foreach loop, wordDoc.MainDocumentPart.Document.ChildElements will be a single-element list that only contains the w:body element. Thus, you create a deep clone of the w:body element and append that to the w:document. The resulting Open XML markup looks like this:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p>
      <w:r>
        <w:t>First paragraph</w:t>
      </w:r>
    </w:p>
    <w:p>
      <w:r>
        <w:t>Second paragraph</w:t>
      </w:r>
    </w:p>
  <w:body>
  <w:body>
    <w:p>
      <w:r>
        <w:t>First paragraph</w:t>
      </w:r>
    </w:p>
    <w:p>
      <w:r>
        <w:t>Second paragraph</w:t>
      </w:r>
    </w:p>
  <w:body>
<w:document>

The above is a w:document with two w:body child elements, which is invalid Open XML markup as w:document must have exactly one w:body child element. Thus, Word shows that error message.

To fix this, you need to work with Document.Body wherever you just use Document. The following, streamlined example shows how to do it.

using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(wordFileNamePath, true))
{
    Body body = wordDoc.MainDocumentPart.Document.Body;
    IEnumerable<OpenXmlElement> clonedElements = body
        .Elements()
        .Select(e => e.CloneNode(true))
        .ToList();

    body.Append(clonedElements);
}

You'll see that I did not save the Document explicitly as that is not necessary due to the using statement and the fact that those documents are auto-saved by default. Secondly, I used ToList() to materialize the collection before appending. This is to avoid any issues while enumerating elements that are changed at the same time.

-1
votes

Why wouldn't be corrupted? You are opening a document, getting all of the child elements, and writing them to the same document. I am not sure what is that supposed to do.