1
votes

Is streaming data into a column-partitioned BigQuery table supported? I'm having trouble, getting this error:

BadRequest: 400 POST https://www.googleapis.com/bigquery/v2/projects/...$20180410/insertAll: Streaming to metadata partition of column based partitioning table ...$20180410 is disallowed.

Reading the BigQuery streaming documentation it says streaming to partitioned tables is allowed, but all examples are for the ingest-time partitions. I don't see reference to the newer column-based partitioning.

Is it supported and I'm just doing it wrong? For example, the error occurs when I explicitly add the partition suffix ($YYYYMMDD). When I don't use the suffix the write succeeds, but it doesn't look like it's actually partitioned.

Here's my sample code:

We have a table with a few columns, let's say this:

date: DATE (partitioned field)
name: STRING
count: INTEGER

I'm trying to do a streaming insert, via:

from google.cloud import bigquery

data = [
    {'date': date('2018-04-10'), 'name': 'x', 'count': 10},
    {'date': date('2018-04-10'), 'name': 'y', 'count': 5},    
]
client = bigquery.Client(...)
table_ref = client.dataset(dataset).table(tableid + '$20180410')
schema = client.get_table(table_ref).schema

# Raises the 400 Bad Request above.
result = client.insert_rows(table_ref, data, selected_fields=schema)
1
pleasse try to remove $20180410 - Lei Chen
When I remove the partition "$20180410" the write succeeds, but near as I can tell the data is not actually partitioned. If I then try to delete data from just the partition via bq rm ...$20180410 the rows are not removed. - Greg
streaming to a particular partition of a column partitioned table is disallowed. You can stream to the table without suffix and the data will be distributed to partitions according to partitioning field value - Lei Chen
it might be still in streaming buffer while you are deleting the partition. Could you please try to delete in a few mins? - Lei Chen
could you please share your full table name with projectid and datasetid? - Lei Chen

1 Answers

3
votes

Updated/Solved (Hat-tip and thanks to Lei Chen. See comments for discussion)

To stream into the partitioned table, do not append the $YYYMMDD on the tableid. BigQuery will do the partitioning for you, but...it appears to be a separate stage. My records appeared in the table from the stream buffer within a few minutes, but took several hours to be properly partitioned.

This is easier to tell when you use the ingest time partition and can query the _PARTITION pseudocolumn. For column partitions, I didn't find a good way to know if it had been partitioned other than bq rm-ing the partition and seeing if the records vanished.