0
votes

We have two tables in bigquery: one is large (a couple billion rows) and on a 'table-per-day' basis, the other is date partitioned, has the exact same schema but a small subset of the rows (~100 million rows).

I wanted to run a (standard-sql) query with a subselect in form of a join (same when subselect is in the where clause) on the small partitioned dataset. I was not able to run it because tier 1 was exceeded. If I run the same query on the big dataset (that contains the data I need and a lot of other data) it runs fine. I do not understand the reason for this. Is it because:

  1. Partitioned tables need more resources to query
  2. Bigquery has some internal rules that the ratio of data processed to resources needed must meet a certain threshold, i.e. I was not paying enough when I queried the small dataset given the amount of resources I needed.

If 1. is true, we could simply make the small dataset also be on a 'table-per-day' basis in order to solve the issue. But before we do that though we would like to know if it is really going to solve our problem.

Details on the queries:

  1. Big datset

queries 11 GB, runs 50 secs, Job Id remilon-study:bquijob_2adced71_15bf9a59b0a

  1. Small dataset

Job Id remilon-study:bquijob_5b09f646_15bf9acd941

1

1 Answers

0
votes

I'm an engineer on BigQuery and I just took a look at your jobs but it looks like your second query has an additional filter with a nested clause that your first query does not. It is likely that that extra processing is making your query exceed your tier. I would recommend running the queries in the BigQuery UI and looking at the Explanation tab to see how the queries differ in the query plan.

If you try running the exact same query (modifying only the partition syntax) for both tables and still get the same error I would recommend filing a bug.