BigQuery wildcard tables - what are the limits?

Question

I'm thinking about organizing a BigQuery dataset with multiple tables that I can query using wildcards.

In this use case, there will potentially be 1000+ tables, each one representing a different customer. I need to be able to query them individually, and I also need to query them all at the same time.

I'm thinking of using multiple tables rather than a single table with clustering, because, when running queries only for specific customers, I would like to be able to scan only the data that I need.

The Quotas and limits page lists the following limits that look applicable:

Maximum number of tables referenced per query — 1,000
Maximum resolved legacy and standard SQL query length — 12 MB
The limit on resolved query length includes the length of all views and wildcard tables referenced by the query.

My questions are:

1) When running a query with wildcard (such as SELECT * FROM `project_id.dataset_id.table_*`), do all the individual tables that match the wildcard get counted against this 1000 tables limit?

2) What exactly is counted against the 12 MB limit? For example, suppose I have the following tables project_id.dataset_id.table_1, project_id.dataset_id.table_2, project_id.dataset_id.table_3.

If I run a SELECT on table_*, what will constitute the resolved query length? Does it include the fully qualified name of all tables (project_id.dataset_id.table_1, project_id.dataset_id.table_2, project_id.dataset_id.table_3)?
If I use the wildcard table_* but include a filter on _TABLE_SUFFIX (for example, WHERE _TABLE_SUFFIX = "1"), does this eliminate tables that don't match the filter from the resolved query?

3) Are there any other limitations that I should be aware of?

Thank you in advance.

Pentium10 Pentium10 · Accepted Answer · 2020-01-30T20:49:00

My advice is against. You will quickly reach 1000 tables and than you cannot query them all. The other is that having each table separately it will be unmaintainable.

You can use clustering to your help. If one of your columns is the tenant_id than you can use that in clustering.

BigQuery wildcard tables - what are the limits?

2 Answers