6
votes

I've been researching how to export BigQuery data into Pandas. There are two methods:

  1. Export the file to a CVS and load it - https://cloud.google.com/bigquery/exporting-data-from-bigquery

  2. Directly pull the data into a pandas frame. This doesn't seem to work but here is the method - pandas.io.gbq.read_gbq(query, project_id=None, index_col=None, col_order=None, reauth=False) . It appears gbq has been discontinued?

Could someone please suggest the best and most efficient way to go about this?

Thank you.

1
What version of pandas are you using? It looks like it still exists in 0.15.0: pandas.pydata.org/pandas-docs/stable/generated/… - Jordan Tigani
Yes that's precisely what i'm using. Still no success. Any other suggestions? - BlackHat
I've developed a python package (with 100% test coverage): google-pandas-load.readthedocs.io/en/latest that follows the first method. - augustin-barillec

1 Answers

7
votes

The gbq.read_gbq method definitely works in pandas .15.0-1 as I just upgraded from .14.0-1 to check (Windows 7). If you are using Python, I would definitely recommend this for getting data into a dataframe from Google BigQuery as it is something I use for almost all my analysis work.

It is hard to say how to overcome your issue without more information, but I would start with checking if the authentication flow is completing in your browser that is logged into your Google account and then troubleshoot from there. There is a deprecation warning on your first authentication flow (oauth2client.tools.run), but everything does still work.

Other than that, I would try following the examples here: http://pandas-docs.github.io/pandas-docs-travis/io.html#io-bigquery

FYI, in the current dev branch, an option for Gcloud authentication is being added to make headless authentication more convenient.