I'm thinking about organizing a BigQuery dataset with multiple tables that I can query using wildcards.
In this use case, there will potentially be 1000+ tables, each one representing a different customer. I need to be able to query them individually, and I also need to query them all at the same time.
I'm thinking of using multiple tables rather than a single table with clustering, because, when running queries only for specific customers, I would like to be able to scan only the data that I need.
The Quotas and limits page lists the following limits that look applicable:
- Maximum number of tables referenced per query — 1,000
- Maximum resolved legacy and standard SQL query length — 12 MB
- The limit on resolved query length includes the length of all views and wildcard tables referenced by the query.
My questions are:
1) When running a query with wildcard (such as SELECT * FROM `project_id.dataset_id.table_*`
), do all the individual tables that match the wildcard get counted against this 1000 tables limit?
2) What exactly is counted against the 12 MB limit? For example, suppose I have the following tables project_id.dataset_id.table_1
, project_id.dataset_id.table_2
, project_id.dataset_id.table_3
.
- If I run a SELECT on
table_*
, what will constitute the resolved query length? Does it include the fully qualified name of all tables (project_id.dataset_id.table_1
,project_id.dataset_id.table_2
,project_id.dataset_id.table_3
)? - If I use the wildcard
table_*
but include a filter on_TABLE_SUFFIX
(for example,WHERE _TABLE_SUFFIX = "1"
), does this eliminate tables that don't match the filter from the resolved query?
3) Are there any other limitations that I should be aware of?
Thank you in advance.