I'm in trouble in loading Huge data to Bigquery.
In GCS, I have huge & many files like this:
gs://bucket/many_folders/yyyy/mm/dd/many_files.gz
I want to load it to BigQuery, so first, I tried:
bq load --source_format=NEWLINE_DELIMITED_JSON \
--ignore_unknown_values\
--max_bad_records=2100000000\
--nosync\
project:dataset.table \
gs://bucket/* \
schema.txt
which failed because of it exceeded "max_bad_records" limit(the file is an aggregation of many types of log so it causes many errors).
Then I calculated to found that I need to use "*" like:
bq load --source_format=NEWLINE_DELIMITED_JSON \
--ignore_unknown_values\
--max_bad_records=2100000000\
--nosync\
gs://bucket/many_folders/yyyy/mm/dd/*\
schema.txt
because of the max_bad_records limitation.
But I found it is very slow(because of pararell-run limitation in BigQuery). And it exceedes daily loading job limitation also. I prefer not doing this option.
Any idea for solving this situation? I want to load this data as fast as I can.
Thank you for reading.