Using python-docx, how can I associate an XML namespace prefix?

Question

I am currently trying to add a checkbox to a word document using the python-docx library. I've narrowed down the checkbox XML to two possible options, the first from evaluating the word/document.xml file from an existing doc and the second from the XML Schema. When trying to insert the XML element into a simple document I receive the error; "lxml.etree.XMLSyntaxError: Namespace prefix xsd on complexType is not defined".

This is what I'm currently trying (using XML from the Schema):

def word_docs(emails):
    cbox = parse_xml('<xsd:complexType name="CT_FFCheckBox"><xsd:sequence>  \
                <xsd:choice><xsd:element name="size"type="CT_HpsMeasure"/>  \
                <xsd:element name="sizeAuto" type="CT_OnOff"/></xsd:choice> \
                <xsd:element name="default" type="CT_OnOff" minOccurs="0"/> \
                <xsd:element name="checked" type="CT_OnOff" minOccurs="0"/> \
                </xsd:sequence></xsd:complexType>')

    doc = Document()
    title = doc.add_heading("Document", 0)
    table = doc.add_table(rows = 1, cols = 4)
    table.style = 'TableGrid'

    row = table.rows[0]
    row.cells[0].text = "Test"

    merged = (row.cells[1].merge(row.cells[2]))
    merged._tc._add_p()
    ....

The following is the XML pulled from an existing document:

<w:tc>
<w:tcPr>
    <w:tcW w:w="4788" w:type="dxa"/>
</w:tcPr>
<w:p wsp:rsidR="00834643" wsp:rsidRPr="00834643" wsp:rsidRDefault="00F12FD5" wsp:rsidP="00834643">
    <w:pPr>
        <w:spacing w:after="0" w:line="240" w:line-rule="auto"/>
    </w:pPr>
    <w:r>
        <w:fldChar w:fldCharType="begin">
            <w:fldData xml:space="preserve">/////2UAAAAUAAYAQwBoAGUAYwBrADEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA</w:fldData>
        </w:fldChar>
    </w:r>
    <aml:annotation aml:id="1" w:type="Word.Bookmark.Start" w:name="Check2"/>
        <w:r>
            <w:instrText> FORMCHECKBOX </w:instrText>
        </w:r>
        <w:r>
            <w:fldChar w:fldCharType="end"/>
        </w:r>
    <aml:annotation aml:id="1" w:type="Word.Bookmark.End"/>
</w:p>

I've been able to manually add the namespace xmlns:xsd="http://www.w3.org/2001/XMLSchema" manually to a document and it seems to open correctly, I am just unsure of how to do this in a pythonic way to automate the process. The XML object manipulation through python-docx may be incorrect, but it is what makes sense to me after comparing the XML format and the python-docx objects and the way they are handled - I haven't been able to test it with this error.

Any help is appreciated! Thanks!

What document did you pull that example XML from? Is it a LibreOffice or perhaps recent MSOffice version? The aml: namespace prefix is not historically part of the WordprocessingXML namespace as far as I know. Usually this would be done with a w:bookmarkStart and w:bookmarkEnd element as I've seen it. See this GitHub issue for example: github.com/python-openxml/python-docx/issues/403. Also look at top of example XML and show which namespace (URL) aml is mapped to. Use @scanny in response so I'm notified. — scanny
@scanny The example xml is from a MS Office Word 2007 document that was saved as a Word 2003 XML Document. I'm running a Win7 VM and was unsure of other ways to inspect the xml without other tools. The aml: namespace is defined here: xmlns:aml="http://schemas.microsoft.com/aml/2001/core" Regarding the GitHub issue, does the overall problem of adding any xml elements come down to creating each individually then appending them to the appropriate parent? — Crudough
@scanny, please put your comment pointing to the GitHub issue into an answe and I would be more than happy to accept it! Thanks! — Crudough

scanny scanny · Accepted Answer · 2017-10-05T19:54:49

Ah, okay, your comment explains it. The MS Word 2003 XML format is not the same as the MS Word 2007 format (which, by the way, is inherently XML and requires no conversion).

To view the XML of a Word 2007 or later .docx file, you simply unzip it (it is a Zip archive). You may need to add a .zip extension first, depending on what tools you use for the unzipping. You'll be interested in the XML in the document.xml file in the resulting tree. I think you'll find that the bookmark appears as a <w:bookmarkStart> and <w:bookmarkEnd> element pair, which will not require any additions to the built-in namespaces of python-docx.

See this GitHub issue for more details and an example: github.com/python-openxml/python-docx/issues/403.

Using python-docx, how can I associate an XML namespace prefix?

1 Answers