2
votes

I ran this command to load 11 files to a Bigquery table:

bq load --project_id=ardent-course-601 --source_format=NEWLINE_DELIMITED_JSON dw_test.rome_defaults_20140819_test gs://sm-uk-hadoop/queries/logsToBq_transformLogs/rome_defaults/20140819/23af7218-617d-42e8-884e-f213a583094a/part* /opt/sm-analytics/projects/logsTobqMR/jsonschema/rome_defaultsSchema.txt

I got this error:

Waiting on bqjob_r46f38146351d545_00000147ef890755_1 ... (11s) Current status: DONE BigQuery error in load operation: Error processing job 'ardent-course-601:bqjob_r46f38146351d545_00000147ef890755_1': Too many errors encountered. Limit is: 0. Failure details: - File: 5: Unexpected. Please try again.

I tried many times after that and still got the same error.

To debug what went wrong, I instead load each file one by one to the Bigquery table. For example:

/usr/local/bin/bq load --project_id=ardent-course-601 --source_format=NEWLINE_DELIMITED_JSON dw_test.rome_defaults_20140819_test gs://sm-uk-hadoop/queries/logsToBq_transformLogs/rome_defaults/20140819/23af7218-617d-42e8-884e-f213a583094a/part-m-00011.gz /opt/sm-analytics/projects/logsTobqMR/jsonschema/rome_defaultsSchema.txt

There are 11 files total and each ran fine.

Could someone please help? Is this a bug on Bigquery side?

Thank you.

1

1 Answers

0
votes

There was an error reading one of the files: gs://...part-m-00005.gz

Looking at the import logs, it appears that the gzip reader encountered an error decompressing the file.

It looks like that file may not actually be compressed. BigQuery samples the header of the first file in the list to determine whether it is dealing with compressed or uncompressed files and to determine the compression type. When you import all of the files at once, it only samples the first file.

When you run the files individually, bigquery reads the header of the file and determines that it isn't actually compressed (despite having the suffix '.gz') so imports it as a normal flat file.

If you run a load that doesn't mix compressed and uncompressed files, it should work successfully.

Please let me know if you think this is not the case and I'll dig in some more.