I am wondering if there is a way in python (tool or function etc.) to convert my pdf file to doc or docx?
I am aware of online converters but I need this in Python code.
I am wondering if there is a way in python (tool or function etc.) to convert my pdf file to doc or docx?
I am aware of online converters but I need this in Python code.
If you happen to have MS Word, there is a really simple way to do this using COM. Here is a script I wrote that can convert pdf to docx by calling the Word application.
import glob
import win32com.client
import os
word = win32com.client.Dispatch("Word.Application")
word.visible = 0
pdfs_path = "" # folder where the .pdf files are stored
for i, doc in enumerate(glob.iglob(pdfs_path+"*.pdf")):
print(doc)
filename = doc.split('\\')[-1]
in_file = os.path.abspath(doc)
print(in_file)
wb = word.Documents.Open(in_file)
out_file = os.path.abspath(reqs_path +filename[0:-4]+ ".docx".format(i))
print("outfile\n",out_file)
wb.SaveAs2(out_file, FileFormat=16) # file format for docx
print("success...")
wb.Close()
word.Quit()