1
votes

I'm working in Google Cloud Datalab and I want to export a Pandas dataframe as a new BigQuery table. I'm trying to follow the online help notebooks that come with Cloud Datalab, though I can see are no examples of exporting to BigQuery, only to Google Cloud Storage.

Anyway, I can work out how to create the table in BigQuery with the correct schema, but I can't work out how to get the actual data into the table!

This is what I've got now:

dataset = bq.DataSet('calculations')
dataset.create(friendly_name='blah', 
               description='blah blah')
print 'Dataset exists', dataset.exists()

# Create the schema for the table we're about to create.
schema = bq.Schema.from_dataframe(measures[0]['data'])
print schema
print len(measures[0]['data'])

# Create a table for our results.
temptable = bq.Table('calculations.test').create(schema=schema, 
                                                 overwrite=True)

# How to export the actual data to the table?

So this outputs:

True
[{'type': 'STRING', 'name': u'id'}, {'type': 'STRING', 'name': ...
8173

showing that my dataframe has 8173 rows.

If I go to BigQuery, I see that the table has been created with the correct schema, but it has no data in it.

How do I actually export the data there?

If it's impossible then I could export to Cloud Storage instead, though I've tried that and am having the same problem. I would prefer to export to BigQuery.

1

1 Answers

2
votes

You need to call:

temptable.insert_data(df)

where df is your Pandas dataframe.