I am using python docx for word file processing. While using larger files(50+ pages), the paragraph.text method is returning string which is inconsistent with my file.
import docx
document=Document(f)
paratext=[]
paragraphs=document.paragraphs
for paragraph in paragraphs:
text=paragraph.text
paratext.append(text)
print(paratext[30])
Ideally this should print the 30th paragraph. But the output seems distorted (Beginning few characters are missing and the printed output starts from somewhere in the middle of the actual paragraph in some cases). However it works fine if I copy the adjacent few paragraphs in a fresh ms word document (1 page only) and run the code by just changing the index of paratext. For eg I copied 3 adjacent paras into a new doc and used print(paratext[2]), the output seems just perfect here. How do I get rid of this inconsistency as I have to work with larger documents.