I am using pdftotext to convert pdf files to txt files.
I tested the code on few files and worked fine but when I run the code on every pdf files I have (about 2000 files) it return error poppler error creating document
Here is the code
import pdftotext
import os
directory = "/testfiles" # path where PDF files are saved
for filename in os.listdir(directory):
if filename.endswith(".pdf"):
pathname = os.path.join(directory, filename)
with open(pathname, 'rb') as f:
pdf = pdftotext.PDF(f) # ERROR : poppler error creating document
txtname = pathname.replace('.pdf', '.txt')
with open(txtname, 'w', encoding='utf-8') as text_file: # edit: encoding utf-8 added
for page in pdf:
text_file.write(page)
continue
What is the problem?
I googled this error and the only solution I found was to update poppler to the latest version but I installed poppler yesterday so I guess there is no need for me to update.
I also tried using pdfplumber but it returned “No /Root object! - Is this really a PDF?”. Do both errors have something to do with the pdf file itself?
I was able to open the file without any error so I guess files are not corrupted.