4
votes

We are using BigQuery as event logging platform.

The problem we faced was very slow insertAll post requests (https://cloud.google.com/bigquery/docs/reference/v2/tabledata/insertAll). It does not matter where they are fired - from server or client side.

Minimum is 900ms, average is 1500s, where nearly 1000ms is connection time. Even if there is 1 request per second (so no throttling here).

We use Google Analytics measurement protocol and timings from the same machines are 50-150ms.

The solution described in BigQuery streaming 'insertAll' performance with PHP suugested to use queues, but it seems to be overkill because we send no more than 10 requests per second.

The question is if 1500ms is normal for streaming inserts and if not, how to make them faster.

Addtional information: If we send malformed JSON, response arrives in 50-100ms.

2

2 Answers

1
votes

To my experience any request to bigquery will take long. We've tried using it as a database for performance data but eventually are moving out due to slow response times. As far as I can see. BQ is built for handling big requests within a 1 - 10 second response time. These are the requests BQ categorizes as interactive. BQ doesn't get faster by doing less. We stream quite some records to BQ but always make sure we batch them up (per table). And run all requests asynchronously (or if you have to in another theat).

PS. I can confirm what Pentium10 sais about faillures in BQ. Make sure you retry the stuff that fails and if it fails again log it to file for retrying it another time.

7
votes

Since streaming has a limited payload size, see Quota policy it's easier to talk about times, as the payload is limited in the same way to both of us, but I will mention other side effects too.

We measure between 1200-2500 ms for each streaming request, and this was consistent over the last month as you can see in the chart.

enter image description here

We seen several side effects although:

  • the request randomly fails with type 'Backend error'
  • the request randomly fails with type 'Connection error'
  • the request randomly fails with type 'timeout' (watch out here, as only some rows are failing and not the whole payload)
  • some other error messages are non descriptive, and they are so vague that they don't help you, just retry.
  • we see hundreds of such failures each day, so they are pretty much constant, and not related to Cloud health.

For all these we opened cases in paid Google Enterprise Support, but unfortunately they didn't resolved it. It seams the recommended option to take for these is an exponential-backoff with retry, even the support told to do so. Which personally doesn't make me happy.

Also the failure rate fits the 99.9% uptime we have in the SLA, so there is no reason for objection.

There's something to keep in mind in regards to the SLA, it's a very strictly defined structure, the details are here. The 99.9% is uptime not directly translated into fail rate. What this means is that if BQ has a 30 minute downtime one month, and then you do 10,000 inserts within that period but didn't do any inserts in other times of the month, it will cause the numbers to be skewered. This is why we suggest a exponential backoff algorithm. The SLA is explicitly based on uptime and not error rate, but logically the two correlates closely if you do streaming inserts throughout the month at different times with backoff-retry setup. Technically, you should experience on average about 1/1000 failed insert if you are doing inserts through out the month if you have setup the proper retry mechanism.

You can check out this chart about your project health: https://console.developers.google.com/project/YOUR-APP-ID/apiui/apiview/bigquery?tabId=usage&duration=P1D

It happens that my response is on the linked other article, and I proposed the queues, because it made our exponential-backoff with retry very easy, and working with queues is very easy. We use Beanstalkd.