The .docx file I have has tables, headers, etc. and I was wondering how I could extract text from that document. The only example code I could find uses paragraphs, and it doesn't work with my file.
Here is the code:
doc = docx.Document(self.filename)
fullText = []
for para in doc.paragraphs:
txt = para.text.encode('ascii', 'ignore')
fullText.append(txt)
self.text = '\n'.join(fullText)
When I run this code, I get this error:
File "annotatorConnections.py", line 75, in openFile
self.text = '\n'.join(fullText)
TypeError: sequence item 0: expected str instance, bytes found