I have been given a series of folders with large amounts of Word documents in .xml formatting. They each contain some VBA code, but the code on all of them has already been run, so I don't need to keep this.
I need to print all of the files in each folder, but due to constraints on XML files on the network, I can't simply mass-print them from Windows Explorer, so I need to convert them to .docx (or .doc) first.
How can I go about doing this? I tried a simple python script using python-docx:
import os
from docx import Document
folderPath=<folderpath>
fileNamesList=os.listdir(folderPath)
for xmlFileName in fileNamesList:
currentDoc=Document(os.path.join(folderPath,xmlFileName))
docxFileName=xmlFileName.replace('.xml','.docx')
currentDoc.save(os.path.join(folderPath,docxFileName))
currentDoc.close()
This gives:
docx.opc.exceptions.PackageNotFoundError: Package not found at <first file name>.xml
I'm guessing this is because python-docx isn't meant to open .xml files, but that's a pretty uneducated guess. Searching around for this error, all I can find are problems with it not being installed properly (which it is, as far as I can tell) or using .doc files instead of .docx.
Am I simply using python-docx incorrectly? If not, is there are more suitable package or technique I should be using?