1
votes

I'm thinking about organizing a BigQuery dataset with multiple tables that I can query using wildcards.

In this use case, there will potentially be 1000+ tables, each one representing a different customer. I need to be able to query them individually, and I also need to query them all at the same time.

I'm thinking of using multiple tables rather than a single table with clustering, because, when running queries only for specific customers, I would like to be able to scan only the data that I need.

The Quotas and limits page lists the following limits that look applicable:

  • Maximum number of tables referenced per query — 1,000
  • Maximum resolved legacy and standard SQL query length — 12 MB
  • The limit on resolved query length includes the length of all views and wildcard tables referenced by the query.

My questions are:

1) When running a query with wildcard (such as SELECT * FROM `project_id.dataset_id.table_*`), do all the individual tables that match the wildcard get counted against this 1000 tables limit?

2) What exactly is counted against the 12 MB limit? For example, suppose I have the following tables project_id.dataset_id.table_1, project_id.dataset_id.table_2, project_id.dataset_id.table_3.

  • If I run a SELECT on table_*, what will constitute the resolved query length? Does it include the fully qualified name of all tables (project_id.dataset_id.table_1, project_id.dataset_id.table_2, project_id.dataset_id.table_3)?
  • If I use the wildcard table_* but include a filter on _TABLE_SUFFIX (for example, WHERE _TABLE_SUFFIX = "1"), does this eliminate tables that don't match the filter from the resolved query?

3) Are there any other limitations that I should be aware of?

Thank you in advance.

2

2 Answers

2
votes

My advice is against. You will quickly reach 1000 tables and than you cannot query them all. The other is that having each table separately it will be unmaintainable.

You can use clustering to your help. If one of your columns is the tenant_id than you can use that in clustering.

2
votes

When running a query with wildcard, do all the individual tables that match the wildcard get counted against this 1000 tables limit?

All tables that match wildcard and _TABLE_SUFFIX filter will count against 1000 limit

What exactly is counted against the 12 MB limit?

Length of fully qualified names of all tables counted against above explained 1000 tables limit will be counted against 12 MB limit

Are there any other limitations that I should be aware of?

Suggestion - combination of partitioning and clustering will serve you much better in use case you describe in your question. Especially that now you can partition by integer range
This will give your superior flexibility in maintaining and processing data to compare with sharded tables approach
Obviously, there are still limitations related to partitioned tables to consider