2
votes

So the existing setup we had use to create a new table for each day, which worked fine with "WRITE_TRUNCATE" option, however when we updated our code to use partitioned table, though our dataflow job, it wouldn`t work with write_truncate.

It works perfectly fine, with write disposition set as "WRITE_APPEND" (From what i understood, from beam, it maybe tries to delete the table, and then recreate it), since i`m supplying the table decorator it fails to create a new table.

Sample snippet using python code:

beam.io.Write('Write({})'.format(date), beam.io.BigQuerySink(output_table_name + '$' + date, create_disposition=beam.io.BigQueryDisposition.CREATE_NEVER, write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE)

which gives the error:

Table IDs must be alphanumeric

since it tried to recreate the table, and we supply the partition decorator in the argument.

Here are some of the things that i`v tried:

  1. Updating the write_disposition as WRITE_APPEND, although it works, it fails the purpose, since running for the same date again would duplicate data.
  2. Using

bq --apilog /tmp/log.txt load --replace --source_format=NEWLINE_DELIMITED_JSON 'table.$20160101' sample_json.json

command, to see if i can observe any logs, on how does truncate actually works, based on the link that i found.

  1. Tried some other links, but this as well uses WRITE_APPEND.

Is there a way to write to a partitioned table, from a dataflow job using write_truncate method?

Let me know if any additional details are required. Thanks

1
The failure to create the table with the partition decorator may be a bug. Let me check and get back to you. - Pablo
Can you provide a stack trace for your 'Table IDs must be alphanumeric'? - Pablo
I checked with the IO dev. It seems that this is not supported now. : / - Pablo
Thanks for replying Pablo :), i was only hoping it does not delete the table for TRUNCATE, and just clears all the rows, for that partition, but i guess it doesn`t work that way beam. - Sirius
@Sirius digging this back up (albeit being a very old question) as I've run in a very similar scenario. Did you end up submitting a Jira card to [this][issues.apache.org/jira/browse/… page or solving with some other approach that wasn't discussed here? - anddt

1 Answers

1
votes

Seems like this is not supported at this time. Credit goes to @Pablo for finding out from the IO dev.

According to the Beam documentation on the Github page, their JIRA page would be the appropriate to request such a feature. I'd recommend filing a feature request there and posting a link in a comment here so that others in the community can follow through and show their support.