2
votes

I have been able to follow the instructions listed here to load google sheets into big query.

https://cloud.google.com/blog/big-data/2016/05/bigquery-integrates-with-google-drive

The only thing that I'm stumped by is that I have to add fields to the schema. So, lets suppose I have 100 columns in Google Sheets, all of which need to be loaded into BQ. Does that mean I have to manually type in 100 schema names one by one and select their field types? I've done one but I would think there's a way to load everything in Google Sheets into BQ automatically.

Could someone point me in the right direction?

Thanks!

3

3 Answers

1
votes

You have to provide schema!

So how about using regular Spreadsheet functionality and just generate schema off of columns' headers in your data in Sheets. you can use most frequent data type first and than go and manually "fix" those fields with different types. This can be done as a script that does work for you and then you use it to supply schema

On the other hand - you can check Table definitions based on Google Sheets source files. it might help

0
votes

You can create table definition for your Google Sheets file (you have to use command line interface though). By default it will make best effort to auto detect your table's schema from file.

0
votes
import pygsheets
from pandas_gbq import gbq


def load_data_from_google_sheet_to_BQ():
        spread_sheet_name = "spread_sheet_name"
        worksheet_name = "worksheet_name"
        table_id = "{0}.{1}".format('Dataset_name', 'table_name')
        gs = pygsheets.authorize(service_file=service_file_name)
        sh = gs.open(spread_sheet_name)
        wks = sh.worksheet_by_title(worksheet_name)
        gs_data = wks.get_as_df()
        gs_data.COLUMN1 = gs_data.COLUMN1.astype(str)
        gs_data.COLUMN2 = gs_data.COLUMN2.astype(str).str.encode('utf-8', errors='strict')
        gs_data.COLUMN3 = gs_data.COLUMN3.astype(str).str.encode('utf-8', errors='strict')
        gs_data.COLUMN4 = gs_data.COLUMN4.astype(str)
        gs_data.COLUMN5 = gs_data.COLUMN5.astype(str)
        gs_data.COLUMN6 = gs_data.COLUMN6.astype(str)

        gs_data = gs_data[['COLUMN1', 'COLUMN2', 'COLUMN3', 'COLUMN4', 'COLUMN5', 'COLUMN6']]
        gbq.to_gbq(gs_data, table_id, project_id, if_exists='replace')



load_data_from_google_sheet_to_BQ()