1
votes

Is streaming data into a column-partitioned BigQuery table supported? I'm having trouble, getting this error:

BadRequest: 400 POST https://www.googleapis.com/bigquery/v2/projects/...$20180410/insertAll: Streaming to metadata partition of column based partitioning table ...$20180410 is disallowed.

Reading the BigQuery streaming documentation it says streaming to partitioned tables is allowed, but all examples are for the ingest-time partitions. I don't see reference to the newer column-based partitioning.

Is it supported and I'm just doing it wrong? For example, the error occurs when I explicitly add the partition suffix ($YYYYMMDD). When I don't use the suffix the write succeeds, but it doesn't look like it's actually partitioned.

Here's my sample code:

We have a table with a few columns, let's say this:

date: DATE (partitioned field)
name: STRING
count: INTEGER

I'm trying to do a streaming insert, via:

from google.cloud import bigquery

data = [
    {'date': date('2018-04-10'), 'name': 'x', 'count': 10},
    {'date': date('2018-04-10'), 'name': 'y', 'count': 5},    
]
client = bigquery.Client(...)
table_ref = client.dataset(dataset).table(tableid + '$20180410')
schema = client.get_table(table_ref).schema

# Raises the 400 Bad Request above.
result = client.insert_rows(table_ref, data, selected_fields=schema)
1
pleasse try to remove $20180410Lei Chen
When I remove the partition "$20180410" the write succeeds, but near as I can tell the data is not actually partitioned. If I then try to delete data from just the partition via bq rm ...$20180410 the rows are not removed.Greg
streaming to a particular partition of a column partitioned table is disallowed. You can stream to the table without suffix and the data will be distributed to partitions according to partitioning field valueLei Chen
it might be still in streaming buffer while you are deleting the partition. Could you please try to delete in a few mins?Lei Chen
could you please share your full table name with projectid and datasetid?Lei Chen

1 Answers

3
votes

Updated/Solved (Hat-tip and thanks to Lei Chen. See comments for discussion)

To stream into the partitioned table, do not append the $YYYMMDD on the tableid. BigQuery will do the partitioning for you, but...it appears to be a separate stage. My records appeared in the table from the stream buffer within a few minutes, but took several hours to be properly partitioned.

This is easier to tell when you use the ingest time partition and can query the _PARTITION pseudocolumn. For column partitions, I didn't find a good way to know if it had been partitioned other than bq rm-ing the partition and seeing if the records vanished.