0
votes

I will try to keep this short and simple.

I made a program that is able to modify text within the Docx document.xml file. I achieved this through xml parsing and it works great. Right now my program is outputting a new document.xml file that is exactly identical to the original one minus the altered text. My problem starts when I try to re-zip the docx files.

Just for testing purposes, I've been manually deleting the original document.xml file within the word folder and adding the new one. Eventually I want my program to do that but not at that point yet.

So whats happening, is after re-zipping all the contents, Microsoft Word says the file is corrupt. By the way if I don't alter the document.xml file, and re-zip it, it works fine. So I don't think there is anything wrong with the zipping.

But I if I delete the original document.xml file and then put it back, Microsoft word says its corrupt. Its really weird.

Here is my original document.xml file

https://www.dropbox.com/s/ghe1m176rdqtng7/document.xml?dl=0

and the updated one.

https://www.dropbox.com/s/8n9llagozbvb2mz/document_output.xml?dl=0

Hope someone can shed some light on whats going on.

Thanks!

1

1 Answers

0
votes

If I'm using the original document, Word also says it would be corrupt.

As far as I see there are three references pointing nowhere. If you comment out the three w:headerReference elements (right at the bottom as children of the w:sectPr element), I can open it without Word complaining.

The new section as a whole:

<w:sectPr w:rsidR="00EC0B63" w:rsidSect="00EC0B63">
    <!--<w:headerReference w:type="even" r:id="rId8"/>
    <w:headerReference w:type="default" r:id="rId9"/>
    <w:headerReference w:type="first" r:id="rId10"/>-->
    <w:pgSz w:w="12240" w:h="15840"/>
    <w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440" w:header="720"
        w:footer="720" w:gutter="0"/>
    <w:cols w:space="720"/>
    <w:titlePg/>
    <w:docGrid w:linePitch="360"/>
</w:sectPr>