0
votes

I have been using Pandas to format dataframes, which I have then converted to CSV and only then uploaded manually to BigQuery (depending on the size, I upload to Cloud Storage before).

I often use Google Colaboratory as my Notebooks UI.

Is it possible to streamline the process by uploading directly from Pandas to BQ/CS? How?

3
You have a low rate. Important on SO, you have to mark accepted answers by using the tick on the left of the posted answer, below the voting. This will increase your rate. See how this works by visinting this link: meta.stackoverflow.com/questions/5234/… - Pentium10

3 Answers

4
votes

You could leverage to_gbq

df = pandas.DataFrame(
        {
            "my_string": ["a", "b", "c"],
            "my_int64": [1, 2, 3],
            "my_float64": [4.0, 5.0, 6.0],
            "my_bool1": [True, False, True],
            "my_bool2": [False, True, False],
            "my_dates": pandas.date_range("now", periods=3),
        }
    )

pandas_gbq.to_gbq(df, destination_id = 'dataset.table', project_id='project_id')
4
votes

You can use load_table_from_dataframe to load data from Pandas to BigQuery:

bigqueryClient = bigquery.Client()
tableRef = bigqueryClient.dataset("bq-dataset").table("bq-table")
bigqueryJob = bigqueryClient.load_table_from_dataframe(dataFrame, tableRef)
bigqueryJob.result()
1
votes

To define a BigQuery dataset Pass a tuple containing project_id and dataset_id to bq.Dataset.

# define a BigQuery dataset    
bigquery_dataset_name = ('project_id', 'dataset_id')
dataset = bq.Dataset(name = bigquery_dataset_name)

To define a BigQuery table Pass a tuple containing project_id, dataset_id and the table name to bq.Table.

# define a BigQuery table    
bigquery_table_name = ('project_id', 'dataset_id', 'table_name')
table = bq.Table(bigquery_table_name)

Create the dataset/ table and write to table in BQ

# Create BigQuery dataset
if not dataset.exists():
    dataset.create()

# Create or overwrite the existing table if it exists
table_schema = bq.Schema.from_data(dataFrame_name)
table.create(schema = table_schema, overwrite = True)

# Write the DataFrame to a BigQuery table
table.insert(dataFrame_name)