2
votes

I am trying to read a docx file and to add the text to a list. Now I need the list to contain lines from the docx file.

example:

docx file:

"Hello, my name is blabla,
I am 30 years old.
I have two kids."

result:

['Hello, my name is blabla', 'I am 30 years old', 'I have two kids']

I cant get it to work.

Using the docx2txt module from here: github link

There is only one command of process and it returns all the text from docx file.

Also I would like it to keep the special characters like ":\-\.\,"

1

1 Answers

7
votes

docx2txt module reads docx file and converts it in text format.

You need to split above output using splitlines() and store it in list.

Code (Comments inline) :

import docx2txt

text = docx2txt.process("a.docx")

#Prints output after converting
print ("After converting text is ",text)

content = []
for line in text.splitlines():
  #This will ignore empty/blank lines. 
  if line != '':
    #Append to list
    content.append(line)

print (content)

Output:

C:\Users\dinesh_pundkar\Desktop>python c.py
After converting text is
 Hello, my name is blabla.

I am 30 years old.

I have two kids.

 List is  ['Hello, my name is blabla.', 'I am 30 years old. ', 'I have two kids.']

C:\Users\dinesh_pundkar\Desktop>