The following works for me
Let's define a dummy dataframe with two columns (id and nestedJson)
import pandas as pd
d = {'id': [1,2,3], 'nestedJson': [{'k1':1, 'k2':2}, {'k1':3, 'k2':4}, {'k1':5, 'k2':6}]}
df = pd.DataFrame(d)
So df looks like this
|
id |
nestedJson |
|---|
| 0 |
1 |
{'k1': 1, 'k2': 2} |
| 1 |
2 |
{'k1': 3, 'k2': 4} |
| 2 |
3 |
{'k1': 5, 'k2': 6} |
Now we create a bigquery Client, set the configuration for the load job (with schema autodetection) and load the data from a dataframe according to the BQ documentation (https://cloud.google.com/bigquery/docs/samples/bigquery-load-table-dataframe)
from google.cloud import bigquery
table_id = 'my_dataset.my_project.my_table' # Change this for your destination
client = bigquery.Client()
job_config = bigquery.LoadJobConfig(
autodetect=True,
write_disposition="WRITE_TRUNCATE"
)
job = client.load_table_from_dataframe(df, table_id, job_config=job_config)
job.result() # Wait for the job to complete.
Finally you can check the job's result
table = client.get_table(table_id) # Make an API request.
print(
"Loaded {} rows and {} columns".format(
table.num_rows, len(table.schema)
)
)
Which should print
Loaded 3 rows and 2 columns
Your BQ table should look like this
Schema
| Field name |
Type |
Mode |
|---|
| id |
INTEGER |
NULLABLE |
| nestedJson |
RECORD |
NULLABLE |
| nestedJson.k1 |
INTEGER |
NULLABLE |
| nestedJson.k2 |
INTEGER |
NULLABLE |
Preview
| Row |
id |
nestedJson.k1 |
nestedJson.k2 |
|---|
| 1 |
1 |
1 |
2 |
| 2 |
2 |
3 |
4 |
| 3 |
3 |
5 |
6 |
Finally it's worth noting that while using schema autodetection works fine, if you try to explicitly set the schema, you'll probably run into this issue
https://github.com/googleapis/python-bigquery/issues/19