1
votes

BigQuery is fast at processing large sets of data, however retrieving large results from BigQuery is not fast at all.

For example, I ran a query that returned 211,136 rows over three HTTP requests, taking just over 12 seconds in total.
enter image description here The query itself was returned from cache, so no time was spent executing the query. The host server is Amazon m4.xlarge running in US-East (Virginia).

In production I've seen this process take ~90seconds when returning ~1Mn rows. Obviously some of this could be down to network traffic... but it seems too slow for that to be the only cause (those 211,136 rows were only ~1.7MB).

Has anyone else encountered such slow speed when having results returned, and have found a resolution?


Update: Reran test on VM inside Google Cloud with very similar results. Ruling out network issues beteween Google and AWS.

2
could you please provide the job id? - xuejian
@xuejian job_BAp8OdilQEzUV7x6HNeEzVh2lo8 - NPSF3000
Sorry, forgot to mention: project id is also needed. - xuejian
never mind, I figured it out. Will do some investigation then. - xuejian
@xuejian as per update I've ruled out Google <--> Amazon network issues by running the test inside google cloud with similar results. - NPSF3000

2 Answers

1
votes

Our SLO on this API is 32 seconds,and a call taking 12 seconds is normal. 90 seconds sounds too long, it must be hitting some of our system's tail latency.

I understand that it is embarrassingly slow. There are multiple reasons to it, and we are working on improving the latency of this API. By the end of Q1 next year, we should be able to roll out a change that would cut tabledata.list time in half (by upgrading the API frontend to our new One Platform technology). If we have more resource, we would also make jobs.getQueryResults faster.

1
votes

Concurrent Requests using TableData.List

It's not great, but there is a resolution.

Make a query, and set the max rows to 1000. If there is no page token simply return the results.

If there is a page token then disregard the results*, and use the TableData.List API. However rather than simply sending one request at a time, send a request for every 10,000 records* in the result. To so this one can use the 'MaxResults' and 'StartIndex' fields. (Note even these smaller pages may be broken into multiple requests*, so paging logic is still needed).

This concurrency (and smaller pages) leads to significant reductions in retrieval times. Not as good as BigQ simply streaming all results, but enough to start realizing the gains from using BigQ.

enter image description here

Potential Pitfals: Keep an eye on the request count, as with larger result-sets there could be 100req/s throttling. It's also worth noting that there's no guarantee of ordering, so using StartIndex field as pseudo-paging may not always return correct results*.

* Anything with a single asterix is still an educated guess, but not confirmed as true/best practise.