I have a data set stored as a local file (~100 GB uncompressed JSON, could still be compressed) that I would like to ingest into BigQuery (i.e. store it there).
Certain guides (for example, https://www.oreilly.com/library/view/google-bigquery-the/9781492044451/ch04.html) suggest to first upload this data to Google Cloud Storage before loading it from there into BigQuery.
Is there an advantage in doing this, over just loading it directly from the local source into BigQuery (using bq load
on a local file)? It's been suggested in a few places that this might speed up loading or make it more reliable (Google Bigquery load data with local file size limit, most reliable format for large bigquery load jobs), but I'm unsure whether that's still the case today. For example, according to its documentation, BigQuery supports resumable uploads to improve reliability (https://cloud.google.com/bigquery/docs/loading-data-local#resumable), although I don't know if those are used when using bq load
. The only limitation I could find that still holds true is that the size of a compressed JSON file is limited to 4 GB (https://cloud.google.com/bigquery/quotas#load_jobs).