0
votes

Background

I am trying to load a json file x.json using bq cli.

cat x.json

{"name":"xyz","mobile":"xxx","location":"abc"}

{"name":"xyz","mobile":"xxx","age":"22"}

Command Used

bq load --autodetect --source_format=NEWLINE_DELIMITED_JSON project:test_datasets.cust x.json

'cust' is a table with empty schema.

I am using '--autodetect ,so that BigQuery autodetects schema.

Output

Upload complete.

Waiting on bqjob_r475558282b85c552_000001569cf1efd8_1 ... (1s) Current status: DONE
BigQuery error in load operation: Error processing job 'project:bqjob_r475558282b85c552_000001569cf1efd8_1': An internal error occurred and the request could not be completed.

Any thoughts on ,why Internal error occurs and how to resolve it?

2
You need to retry! Internal error is part of the SLA of 99% uptime :)Pentium10
ok..right worked...:)Rohan
If an answer has helped you solve your problem and you accepted it you should also consider voting it up. See more at stackoverflow.com/help/someone-answers and Upvote section in meta.stackexchange.com/questions/5234/….Mikhail Berlyant

2 Answers

0
votes

We seen several problems:

  • the request randomly fails with type 'Backend error'
  • the request randomly fails with type 'Connection error'
  • the request randomly fails with type 'timeout' (watch out here, as only some rows are failing and not the whole payload)
  • some other error messages are non descriptive, and they are so vague that they don't help you, just retry.
  • we see hundreds of such failures each day, so they are pretty much constant, and not related to Cloud health.

For all these we opened cases in paid Google Enterprise Support, but unfortunately they didn't resolved it. It seams the recommended option to take is an exponential-backoff with retry, even the support told to do so. Also the failure rate fits the 99.9% uptime we have in the SLA, so there is no reason for objection.

There's something to keep in mind in regards to the SLA, it's a very strictly defined structure, the details are here. The 99.9% is uptime not directly translated into fail rate. What this means is that if BQ has a 30 minute downtime one month, and then you do 10,000 inserts within that period but didn't do any inserts in other times of the month, it will cause the numbers to be skewered. This is why we suggest a exponential backoff algorithm. The SLA is explicitly based on uptime and not error rate, but logically the two correlates closely if you do streaming inserts throughout the month at different times with backoff-retry setup. Technically, you should experience on average about 1/1000 failed insert if you are doing inserts through out the month if you have setup the proper retry mechanism.

You can check out this chart about your project health: https://console.developers.google.com/project/YOUR-APP-ID/apiui/apiview/bigquery?tabId=usage&duration=P1D

0
votes

Try with this:

bq \
--project_id your_id_proyect \
--location=US \
load \
--autodetect \
--source_format=NEWLINE_DELIMITED_JSON \
'name_of_your_table' \
x.json