I'm trying to load a JSON directly into a new Google BigQuery table, without converting it to a CSV.
The JSON I receive is in the following format:
{'columns': ['Month ', 'Product ', 'Customer', 'sales_qty', 'net_sales'],
'data': [['201701', '001117', 'C00899', 72.0, 27400.0], ['201701', '001117', 'C00954', 72.0, 32640.0]]}
I need to be able to programmatically create the table as the schema might vary, so I'm using the autodetect option in JobConfig (works fine with CSVs)
I wrote the following snippet of code:
json_object = json.loads(my_json)
gbqclient = bigquery.Client.from_service_account_json(GBQ_JSON_KEY)
dataset_ref = gbqclient.dataset(GBQ_DATASET)
table_ref = dataset_ref.table(GBQ_TABLE)
# job config for layout JSON
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
#job_config.skip_leading_rows = 0
job_config.autodetect = True
# push data
job = gbqclient.load_table_from_json(json_object , table_ref, job_config = job_config)
pass
This connects properly to GBQ but the upload job then fails with an error saying:
Error while reading data, error message: Failed to parse JSON: No active field found.; ParsedString returned false
I understand the JSON format i'm using is not a NEWLINE_DELIMITED_JSON, I am looking for help as to how to convert the JSON in a format readable by the BigQuery APIs.
Thanks!