7
votes

Getting an Error on loading a large CSV into bigquery. Everywhere I read online I see that there is a 5gb size limit on zipped files but no limits on CSV.

BigQuery error in load operation: Error processing job 'bqjob_r3016bbfad3037f_0000015cea1a1eff_1': Input CSV files are not splittable and at least one of the files is larger than the maximum allowed size. Size is: 24686861596. Max allowed size is: 4294967296.

2
So 24686861596 is 25 GB. This comment claims that gsutil can handle such a size. You can try first uploading the file to a Google Cloud Storage, and then move it from Google Cloud Storage into Google BigQuery. You can transfer from CS directly into BQ. You could probably do it even faster if you converted your data into gzip format. - user1367204
This is for a load from Google Cloud Storage - user3078500
So it's correct that this isn't a move from local to remote, but a move from remote (Cloud Storage) to remote (Big Query)? - user1367204

2 Answers

7
votes

BigQuery documentation lists various limits for import jobs here: https://cloud.google.com/bigquery/quota-policy#load_jobs In particular it notes, that the limit of compressed CSV file is 4 GBs.

The error message about "not splittable" CSV file can come in two cases:

  1. CSV file was compressed
  2. There is a quoting character mismatch in one of the fields, which makes it look like very long string in that field, also making file not splittable (this is what likely happened in your case).
5
votes

Try this:

  • Turn off quoting
  • Set separating character to a non occurring character.

bq help load:

--quote: Quote character to use to enclose records. Default is ". To indicate no quote character at all, use an empty string.
-F,--field_delimiter: The character that indicates the boundary between columns in the input file. "\t" and "tab" are accepted names for tab.

This will import each CSV line to a one column table. Split afterwards within BigQuery (with REGEXP_EXTRACT(), SPLIT(), or JavaScript UDF).