0
votes

I have a json record which looks like

{"customer_id":"2349uslvn2q3","order_id":"9sufd23rdl40",
 "line_item": [{"line":"1","sku":"10","amount":10},
               {"line":"2","sku":"20","amount":20}]}

I am trying to load record stated above into the table which has schema definition as,

"fields": [
  {
    "mode": "NULLABLE",
    "name": "customer_id",
    "type": "STRING"
  },
  {
    "mode": "NULLABLE",
    "name": "order_id",
    "type": "STRING"
  },
  {
    "mode": "REPEATED",
    "name": "line_item",
    "type": "STRING"
  }
]

I am getting following error "message":

JSON parsing error in row starting at position 0 at file: gs://gcs_bucket/file0. JSON object specified for non-record field: line_item

I want line_item json string which can have more than 1 row as array of json string in line item column in table.

Any suggestion?

1

1 Answers

3
votes

The first thing is that your input JSON should't have a "\n" character, so you should save it like:

{"customer_id":"2349uslvn2q3","order_id":"9sufd23rdl40", "line_item": [{"line":"1","sku":"10","amount":10}, {"line":"2","sku":"20","amount":20}]}

One example of how your JSON file should look like:

{"customer_id":"2349uslvn2q3","order_id":"9sufd23rdl40", "line_item": [{"line":"1","sku":"10","amount":10}, {"line":"2","sku":"20","amount":20}]}
{"customer_id":"2","order_id":"2", "line_item": [{"line":"2","sku":"20","amount":20}, {"line":"2","sku":"20","amount":20}]}
{"customer_id":"3","order_id":"3", "line_item": [{"line":"3","sku":"30","amount":30}, {"line":"3","sku":"30","amount":30}]}

And also your schema is not correct. It should be:

[
  {
    "mode": "NULLABLE",
    "name": "customer_id",
    "type": "STRING"
  },
  {
    "mode": "NULLABLE",
    "name": "order_id",
    "type": "STRING"
  },
  {
    "mode": "REPEATED",
    "name": "line_item",
    "type": "RECORD",
    "fields": [{"name": "line", "type": "STRING"}, {"name": "sku", "type": "STRING"}, {"name": "amount", "type": "INTEGER"}]
  }
]

For a better understanding of how schemas work, I've tried writing sort of a guide in this answer. Hopefully it can be of some value.

If your data content is saved for instance in a filed called gs://gcs_bucket/file0 and your schema in schema.json then this command should work for you:

bq load --source_format=NEWLINE_DELIMITED_JSON dataset.table gs://gcs_bucket/file0 schema.json

(supposing you are using the CLI tool as it seems to be the case in your question).