0
votes

I'm trying to load a JSON directly into a new Google BigQuery table, without converting it to a CSV.

The JSON I receive is in the following format:

{'columns': ['Month ', 'Product ', 'Customer', 'sales_qty', 'net_sales'],
 'data': [['201701', '001117', 'C00899', 72.0, 27400.0], ['201701', '001117', 'C00954', 72.0, 32640.0]]}

I need to be able to programmatically create the table as the schema might vary, so I'm using the autodetect option in JobConfig (works fine with CSVs)

I wrote the following snippet of code:

json_object = json.loads(my_json)
gbqclient = bigquery.Client.from_service_account_json(GBQ_JSON_KEY)
dataset_ref = gbqclient.dataset(GBQ_DATASET)
table_ref = dataset_ref.table(GBQ_TABLE)

# job config for layout JSON
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
#job_config.skip_leading_rows = 0
job_config.autodetect = True

# push data
job = gbqclient.load_table_from_json(json_object , table_ref, job_config = job_config)
pass

This connects properly to GBQ but the upload job then fails with an error saying:

Error while reading data, error message: Failed to parse JSON: No active field found.; ParsedString returned false

I understand the JSON format i'm using is not a NEWLINE_DELIMITED_JSON, I am looking for help as to how to convert the JSON in a format readable by the BigQuery APIs.

Thanks!

1
Please let me know if this works for you.Mahboob

1 Answers

1
votes

Your JSON object is not correct. You need to have an array of rows. The rows are key:value element.

Like this

data = [
  {
    'Month':'201701',
    'Product':'001117',
    'Customer':'C00899',
    'sales_qty':72.0, 
    'net_sales':27400.0
  },
  {
    'Month':'201701',
    'Product':'001117',
    'Customer':'C00954',
    'sales_qty':72.0, 
    'net_sales':32640.0
  },
]

Transform your current JSON to got one like this.


Here a sample code to translate your entry to the data expected

    entry = {'columns': ['Month ', 'Product ', 'Customer', 'sales_qty', 'net_sales'],
           'data': [['201701', '001117', 'C00899', 72.0, 27400.0], ['201701', '001117', 'C00954', 72.0, 32640.0]]}

    c = entry['columns']
    data=[]
    for d in entry['data']:
        row={}
        for i in range(len(c)):
            row[c[i]] = d[i]
        data.append(row)
    print(data)

Note: my python skill is low, but it works