0
votes

There is a lot of documentation on gcp about querying sharded/wildcard tables [1][2], but I can't seem to figure out how to create or insert data into such as table.

Here's a trivial and mostly nonsensical example:

-- STANDARD SQL

CREATE TABLE IF NOT EXISTS  `mydataset.mytable_CA` AS
SELECT "CA" as COUNTRY_CODE

CREATE TABLE IF NOT EXISTS  `mydataset.mytable_US` AS
SELECT "US" as COUNTRY_CODE

SELECT * FROM `mydataset.mytable_*` WHERE _TABLE_SUFFIX = "CA"

Wildcards in CREATE-staments

How do I use wildcards in the create statement?

-- STANDARD SQL
-- DOES NOT WORK
CREATE TABLE IF NOT EXISTS  `mydataset.mytable_*` AS
SELECT "US" as COUNTRY_CODE WHERE _TABLE_SUFFIX = "US"

Wildcards in INSERT-statements

Similarly, how do I do inserts?

Is it possible to do something along the lines of the following:

-- STANDARD SQL
-- DOES NOT WORK
INSERT INTO `mydataset.mytable_*` WHERE _TABLE_SUFFIX = COUNTRY_CODE
SELECT * FROM UNNEST(["US", "US", "CA"]) as COUNTRY_CODE

I know it is recommended to use partitioning and clustering, but I am asking specifically about wildcards, so please indulge me :)

Is this possible using standard SQL? If not, how about legacy SQL?

1
As far as I know, you can't use wildcards in create and insert statements. If you want to do it, you can use an external programming language such as Python to dynamically create the query and run it through.Sabri Karagönen

1 Answers

2
votes

As you can see here, DML has some limitations in BigQuery and one of them is:

Queries that contain Data Manipulation Language (DML) statements cannot use a wildcard table as the target of the query. For example, a wildcard table may be used in the FROM clause of an UPDATE query, but a wildcard table cannot be used as the target of the UPDATE operation.

In other words, you can not use a wildcard to specify a table to be created, updated or to have new records inserted. Despite that, you can use it to specify the source of your date.

Let's take a look in some examples of how to use wildcards with DML and DDL.

INSERTING

If you're inserting data into a wildcard table, you need to insert your data in the final table. In other words, you need to specify the coplete path including the suffix.

INSERT INTO `mydataset.mytable_US` <your query or your values>

If you're using the wildcard table as the source of data for inserting data in a normal table, you can use wildcards in the WHERE clause:

INSERT INTO `mydataset.mynormaltable` 
SELECT <fields> 
FROM `mydataset.mytable_*` 
WHERE _TABLE_SUFFIX = "US"


CREATING

Again, you can not use wildcard in the target, but you can use it to specify a source of data:

CREATE TABLE IF NOT EXISTS  `mydataset.normal_table` 
AS SELECT <fields> 
FROM `mydataset.mytable_*` 
WHERE _TABLE_SUFFIX = "US"

If you want to create a new wildcard table with the same prefix, you can do:

CREATE TABLE IF NOT EXISTS  `mydataset.mytable_GB` 
AS SELECT <fields> 
FROM `mydataset.table`

In this example you created a new suffix named GB. After this, you will be able to query your table with the GB suffix normaly.

Finally, I would like to attach here a reference to DML and DDL for BigQuery.

I hope it helps