Word found unreadable content in xxx.docx after split a docx using openxml

Question

I have a full.docx which includes two math questions, the docx embeds some pictures and MathType equation (oleobject), I split the doc according to this, get two files (first.docx, second.docx) , first.docx works fine, the second.docx, however, pops up a warning dialog when I try to open it:

"Word found unreadable content in second.docx. Do you want to recover the contents of this document? If you trust the source of this document, click Yes."

After click "Yes", the doc can be opened, the content is also correct, I want to know what is wrong with the second.docx? I have checked it with "Open xml sdk 2.5 productivity tool", but found no reason. Very appreciated for any help. Thanks.

The three files have been uploaded to here.

Show some code:

        byte[] templateBytes = System.IO.File.ReadAllBytes(TEMPLATE_YANG_FILE);
        using (MemoryStream templateStream = new MemoryStream())
        {
            templateStream.Write(templateBytes, 0, (int)templateBytes.Length);

            string guidStr = Guid.NewGuid().ToString();

            using (WordprocessingDocument document = WordprocessingDocument.Open(templateStream, true))
            {
                document.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);

                MainDocumentPart mainPart = document.MainDocumentPart;

                mainPart.Document = new Document();
                Body bd = new Body();

                foreach (DocumentFormat.OpenXml.Wordprocessing.Paragraph clonedParagrph in lst)
                {
                    bd.AppendChild<DocumentFormat.OpenXml.Wordprocessing.Paragraph>(clonedParagrph);

                    clonedParagrph.Descendants<Blip>().ToList().ForEach(blip =>
                    {
                        var newRelation = document.CopyImage(blip.Embed, this.wordDocument);
                        blip.Embed = newRelation;
                    });

                    clonedParagrph.Descendants<DocumentFormat.OpenXml.Vml.ImageData>().ToList().ForEach(imageData =>
                    {
                        var newRelation = document.CopyImage(imageData.RelationshipId, this.wordDocument);
                        imageData.RelationshipId = newRelation;
                    });
                }

                mainPart.Document.Body = bd;
                mainPart.Document.Save();
            }

            string subDocFile = System.IO.Path.Combine(this.outDir, guidStr + ".docx");
            this.subWordFileLst.Add(subDocFile);

            File.WriteAllBytes(subDocFile, templateStream.ToArray());
        }

the lst contains Paragraph cloned from original docx using:

(DocumentFormat.OpenXml.Wordprocessing.Paragraph)p.Clone();

You don't mention how you used the Productivity Tool. Did you save the repaired document to a new name, close it, then open the original (problem) document in the Tool and use the Compare feature to see what was changed? — Cindy Meister
@Cindy Meister, thank you, I compare second.docx and a new repaired, and found difference between /word/_rels/document2.xml.rels and /word/_rels/document.xml.rels, in repaired docx, I found some embeddings/oleObjectx.bin (x is 1, 2, 3 4) which are missed in second.docx (wrong docx), don't know how to copy those oleobjects when splitting. — James Hao
@Cindy Meister, during splitting, the copy is based on Paragraph wise, also Blip and ImageData are handled as well. But the oleobject is not handled in special. I think the oleobject is included in Paragraph. — James Hao
I can't answer this for you, at least not in this format. First: look at the code the Prod.Tool generates for creating the repaired version from the first - that should give you some clues. If that doesn't help, I recommend you use the edit link below the question to change the question to what the real problem is (your code not copying the OLE object)... — Cindy Meister
... Keep the background info (splitting doc based on "link"). Then that the problem second doc is not valid due to xyz (provide some detail) not being copied across correctly. Include relevant Word Open XML for the "bad" and "repaired" document. Also include your attempt, based on the code the Productivity Tool generated, for taking care of the problem and describe how it's not producing the correct result. — Cindy Meister

James Hao James Hao · Accepted Answer · 2020-02-13T14:23:26

Using productivity tool, found oleobjectx.bin not copied, so I add below code after copy Blip and ImageData:

clonedParagrph.Descendants<OleObject>().ToList().ForEach(ole =>
{
    var newRelation = document.CopyOleObject(ole.Id, this.wordDocument);
    ole.Id = newRelation;
});

Solved the issue.

Word found unreadable content in xxx.docx after split a docx using openxml

1 Answers