Im currently trying to write a Pandas Dataframe (Python 3.x) into Google Big Query. The table has a column with dtype object that contains an array of string values.
sample of pandas table I aim to create a BQ table that maintains a nested table structure as below: sample of Big Query table with following schema: schema of Big Query table
Im using the google-cloud-bigquery library as that allows the df to convert to the Parquet format that per documentation supports nested array values:
code used:
client = bigquery.Client()
table_id = 'dataset.table'
job_config = bigquery.LoadJobConfig(
schema = [
bigquery.SchemaField('route_id', 'INTEGER'),
bigquery.SchemaField('types', 'STRING', mode='REPEATED')
],
writeDisposition="WRITE_APPEND"
)
job = client.load_table_from_dataframe(
df,
table_id,
job_config=job_config,
)
# Wait for the load job to complete.
job.result()
but unfortunately Im getting the following error message returned:
BadRequest: 400 Error while reading data, error message: Provided schema is not compatible with the file 'prod-scotty-76a528bc-407d-4224-8951-c8ff0c71faa1'. Field 'types' is specified as REPEATED in provided schema which does not match NULLABLE as specified in the file.
What has been tried so far:
- used RECORD field type
but that caused the following error: https://github.com/googleapis/python-bigquery/issues/21
- not shipping any schema in python at all (and allow Python/BQ to sort it out themselves)
surprisingly this works for the first iteration (CREATE_IF_NEEDED)creating a table in BQ that maintained the nested structure with the following schema automatically applied: auto-applied schema of BQ table, but fails if you try to append even the exact table again returning the same error as under 1.
Any advise or tips?