3
votes

I am trying to extract tables from pdf using camelot and I get this attribute error. Could you please help?

import camelot
import pandas as pd
pdf = camelot.read_pdf("Gordian.pdf")

AttributeError Traceback (most recent call last) in ----> 1 pdf = camelot.read_pdf("Gordian.pdf")

AttributeError: module 'camelot' has no attribute 'read_pdf'

5
Please read github.com/atlanhq/camelot/issues/118 and github.com/atlanhq/camelot/issues/145. If you don't find a solution, post the output of the command dir(camelot) - Stefano Fiorucci - anakin87
you may have probably installed camleot instead of camelot-py - ExtractTable.com

5 Answers

8
votes

NOTE : If you are using virtual environment activate environment before do this things.

I have already faced this error.There is a no bug in your code.The problem is with camelot installation.

1 remove installed camelot version

2 install again using this command. There is a multiple ways to install camelot. Please try it one by one

  • pip install camelot-py
  • pip install camelot-py[cv]
  • pip install camelot-py[all]

3 run your code >> i have attached sample code here

import camelot

data = camelot.read_pdf("test_file.pdf", pages='all')
print(data)
1
votes

please check if you have java installed on you machine, go to your terminal and run "java -version", if you do not have you won't be able to read pdf using Camelot or tabula,

once you have installed java, install tabula-py using the command pip install tabula-py.

from tabula.io import read_pdf
tables = read_pdf('file.pdf')  # substitute your file name
0
votes

Try this: import camelot.io as camelot That worked for me.

0
votes

I abandoned trying to get camelot to work in Jupiter Notebooks to read tables & instead installed the following:

!{sys.executable} -m pip install tabula-py tabulate

from tabula import read_pdf
from tabulate import tabulate


pdf_path = (
    Path.home()
    / "my_pdf.pdf"
)
df = read_pdf(str(pdf_path), pages=1)
df[0]
-1
votes

When downloading the library please pay attention to where it is downloaded. Because the library you downloaded may have been saved in another Python version