2
votes

I am firing a set of queries on tables in BigQuery dataset.

There are 3 select * queries as below:

"Select * from table1"    //1.3M records and 2.5GB data
"Select * from table2"      //0.3M records and 15 GB data
"Select * from table3"     //2M    records and 3GB data

We are querying the above tables using the spark connector. However intermittently we are seeing an error:

403 Forbidden"
"domain" : "usageLimits",
message: " "message" : "Exceeded rate limits: Your project: exceeded quota for tabledata.list bytes per second per project.

The assumption here is that the tabledata list call is failing because it is returning more than 60 MB per sec which seems to be the default quota as per https://cloud.google.com/bigquery/troubleshooting-errors

1
That's not an assumption, that's a fact. Slow down to do not reach the limit.Pentium10

1 Answers

3
votes

Tabledata.list isn't really optimized for high-throughput use cases such as Spark. You may want to check out other options for reading from BigQuery -- in particular, this use case is what the BigQuery Storage API is designed for, including a native Spark connector.