I have been running a Dataflow job using Python that utilizes the pandas library. It suddenly started failing with the following error:
File "/usr/local/lib/python2.7/dist-packages/pandas_gbq/auth.py", line 305, in _try_credentials client = bigquery.Client(project=project_id, credentials=credentials)
File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigquery/client.py", line 161, in init self._connection = Connection(self, client_info=client_info)
File "/usr/local/lib/python2.7/dist-packages/google/cloud/bigquery/_http.py", line 33, in init super(Connection, self).init(client, client_info)
TypeError: init() takes exactly 2 arguments (3 given)
It is failing on this step:
import pandas as pd
data = pd.read_gbq(query=query, project_id=project, dialect='standard', private_key=credentials)
My setup file looks like this:
install_requires=[
'google-cloud-storage==1.11.0',
'requests==2.19.1',
'urllib3==1.23',
'pandas-gbq==0.6.1',
'pandas==0.23.4',
'protobuf==3.6.0'
]
This is the same version that is on my local, where the code is working. No changes had been implemented to the job when it started failing. It runs successfully on local, but I see the issue when I run with the Dataflowrunner. I'm thinking this is a dependency issue. Are there documented issues with any of the package versions I'm using? Or are there specific package versions I need to add to my setup file?