0
votes

I have a docx file with just text. I would like to create a new docx file containing just part of a page in the original docx. I am using python-docx for this. So far I have been able to transverse the original docx document and copy each desired paragraph/run in the original to the new document as follows (this example should make an exact copy, I believe):

Doc = docx.Document('/tmp/input.docx')
OutDoc = docx.Document()

for para in Doc.paragraphs:
    currentParagraph = OutDoc.add_paragraph(style=para.style)
    for run in para.runs:
        currentParagraph.add_run(run.text, style=run.style)
OutDoc.save('/tmp/output.docx')

Even though I am copying all style information, it seems that I am missing something, as the output lacks some of the formatting.

1
What formatting? Would you please post screenshots showing the original paragraph and the copied paragraph? --- I haven't used python-docx, but I can say that, in general, if you are not also copying the styles and section formatting info, you may not replicate the paragraph exactly. - cxw

1 Answers

1
votes

In Word, the style name applied to a paragraph or run (or any other content) is ignored if that style is not explicitly defined in the new document.

You can either parse through the styles in the source document and recreate each one in the new document, or create a blank "template" document for the new document that already contains the styles you want.

The "default" python-docx document template includes many of the built-in styles, but if your document uses any customized styles, that would explain the symptom you're seeing.

See these pages in the documentation for more: http://python-docx.readthedocs.org/en/latest/user/styles-understanding.html http://python-docx.readthedocs.org/en/latest/user/styles-using.html http://python-docx.readthedocs.org/en/latest/api/document.html#docx.document.Document.styles http://python-docx.readthedocs.org/en/latest/api/style.html