I am just getting started with the python BigQuery API (https://github.com/GoogleCloudPlatform/google-cloud-python/tree/master/bigquery) after briefly trying out (https://github.com/pydata/pandas-gbq) and realizing that the pandas-gbq does not support RECORD type, i.e. no nested fields.
Now I am trying to upload nested data to BigQuery. I managed to create the table with the respective Schema, however I am struggling with the upload of the json data.
from google.cloud import bigquery
from google.cloud.bigquery import Dataset
from google.cloud.bigquery import LoadJobConfig
from google.cloud.bigquery import SchemaField
SCHEMA = [
SchemaField('full_name', 'STRING', mode='required'),
SchemaField('age', 'INTEGER', mode='required'),
SchemaField('address', 'RECORD', mode='REPEATED', fields=(
SchemaField('test', 'STRING', mode='NULLABLE'),
SchemaField('second','STRING', mode='NULLABLE')
))
]
table_ref = client.dataset('TestApartments').table('Test2')
table = bigquery.Table(table_ref, schema=SCHEMA)
table = client.create_table(table)
When trying to upload a very simple JSON to bigquery I get a rather ambiguous error
400 Error while reading data, error message: JSON table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details.
Beside the fact that it makes me kinda sad that its giving up on me :), obviously that error description does not really help ... Please find below how I try to upload the JSON and the sampledata.
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
with open('testjson.json', 'rb') as source_file:
job = client.load_table_from_file(
source_file,
table_ref,
location='US', # Must match the destination dataset location.
job_config=job_config) # API request
job.result() # Waits for table load to complete.
print('Loaded {} rows into {}:{}.'.format(
job.output_rows, dataset_id, table_id))
This is my JSON object
"[{'full_name':'test','age':2,'address':[{'test':'hi','second':'hi2'}]}]"
A JSON example would be wonderful as this seems the only way to upload nested data if I am not mistaken.