How can I join INFORMATION_SCHEMA metadata with wildcard table syntax like _TABLE_SUFFIX, so I can get results by table_name in Google Big Query

Question

I have a Google Big Query dataset that contains tables using snapshots (for example: I see a table named .system_users, and there are actually multiple tables named in the system_users[YYYYMMDD] format).

I want to query all of the snapshots for unique values in the column "_schema_push_date", and display the full name of the table. The end result I'm looking for this:

Row    | table_name               | _schema_push_date 
1      | system_users20200101     | 2020-01-01 09:51:29.251 UTC
2      | system_users20200102     | 2020-01-02 08:53:04.017 UTC

I am using

'''WHERE REGEXP_CONTAINS(_TABLE_SUFFIX, r"202001([0-2])")'''

to limit table the tables queried, but I think I also need to join _TABLE_SUFFIX against INFORMATION_SCHEMA.TABLES.TABLE_NAME to get table names into the results.

it is possible to join _TABLE_SUFFIX with INFORMATIOIN_SCHEMA data?

It is possible to use the INFORMATION_SCHEMA.TABLES to get a table with metadata from dataset, then query against actual tables using _TABLE_SUFFIX/wildcards and finally join the two tables (and both of these tables should be inside a With() statement). Also, if you are joining by table name, table name should be a column in both tables. Could you please also check this thread which describe similar scenario? Let me know about the results. — aga
Thanks @muscat - this didn't answer my question, but did make me realize that I can ask it better: "Can I include the name of a table in query results when using wildcard syntax?" As you point out, to join I'll need a common column in both queries like 'table_name'. My problem is actually that when using _TABLE_SUFFIX as a filter, I only get the filtered portion of _TABLE_SUFFIX in results. For example: 'select _schema_push_date, count(1), _TABLE_SUFFIX as my_table_name FROM my_proj.my_dataset.* WHERE _TABLE_SUFFIX LIKE '%20200523' GROUP BY my_table_name, _schema_push_date' — Dave Reeck

Dave Reeck Dave Reeck · Accepted Answer · 2020-05-25T20:41:24

To include the name of a table when using Wildcard syntax in GBQ, you must add _TABLE_SUFFIX to your select statement and alias it. For example:

  select  _TABLE_SUFFIX as my_table_name, email, count(1)
  FROM `my_project.my_dataset.*`
  WHERE REGEXP_CONTAINS(_TABLE_SUFFIX, r"202005[0-3]*")
  GROUP BY my_table_name, email

Will produce results like

my_table_name     | email        | f0
my_first_table    | my@email.com | 2345

Here's an interesting link to other pseudo-columns: http://bigdata.freeideas.cz/subdom/bigdata/2017/05/30/bigquery-cheat-sheet-standard-sql-meta-tables-pseudo-columns/

How can I join INFORMATION_SCHEMA metadata with wildcard table syntax like _TABLE_SUFFIX, so I can get results by table_name in Google Big Query

1 Answers