1
votes

I'm using Google BigQuery and it provides few public sample tables. One of them is the wikipedia revision history [publicdata:samples.wikipedia]

For some testing purposes, I would like to export it and save it to Google Cloud Storage.

But if I run the export job in Googel BigQuery, it runs for 5 hours and the job fails :(

Only message returned was Errors: Backend error. Job aborted.

It may be because the data size is around 35GB. All other provided samples are less than 25 GB and I've successfully exported them to the Google Cloud Storage.

Does anyone know what the problem is and way to get around with it?

1
can you send the job id of the failed export?Jordan Tigani

1 Answers

1
votes

It looks like there is a timeout on export jobs that kills them after 2 hours (it then retries twice). Because we currently process exports sequentially (that is we read and convert one row of data at a time and write out to a single file), it can take a long time to process if the results are large.

If you provide a file glob pattern (e.g. gs://foo/bar*) as your destination path, BigQuery can split up the export into pieces and perform them in parallel, thus spending less time on the extract.

Recent changes will also the export process faster.