19
votes

Is it possible to read and write Word (2003 and 2007) files in Python without using a COM object?
I know that I can:

f = open('c:\file.doc', "w")
f.write(text)
f.close()

but Word will read it as an HTML file not a native .doc file.

4

4 Answers

7
votes

I'd look into IronPython which intrinsically has access to windows/office APIs because it runs on .NET runtime.

39
votes

See python-docx, its official documentation is available here.

This has worked very well for me.

11
votes

If you only what to read, it is simplest to use the linux soffice command to convert it to text, and then load the text into python:

3
votes

doc (Word 2003 in this case) and docx (Word 2007) are different formats, where the latter is usually just an archive of xml and image files. I would imagine that it is very possible to write to docx files by manipulating the contents of those xml files. However I don't see how you could read and write to a doc file without some type of COM component interface.