In Dataflow 1.x
versions, we could use CloudBigtableIO.writeToTable(TABLE_ID)
to create, update, and delete Bigtable rows. As long as a DoFn
was configured to output a Mutation
object, it could output either a Put
or a Delete
, and CloudBigtableIO.writeToTable()
successfully created, updated, or deleted a row for the given RowID.
It seems that the new Beam 2.2.0 API
uses BigtableIO.write()
function, which works with KV<RowID, Iterable<Mutation>>
, where the Iterable
contains a set of row-level operations. I have found how to use that to work on Cell-level data, so it's OK to create new rows and create/delete columns, but how do we delete rows now, given an existing RowID?
Any help appreciated!
** Some further clarification:
From this document: https://cloud.google.com/bigtable/docs/dataflow-hbase I understand that changing the dependency ArtifactID from bigtable-hbase-dataflow
to bigtable-hbase-beam
should be compatible with Beam version 2.2.0 and the article suggests doing Bigtble writes (and hence Deletes) in the old way by using CloudBigtableIO.writeToTable()
. However that requires imports from the com.google.cloud.bigtable.dataflow
family of dependencies, which the Release Notes suggest is deprecated and shouldn't be used (and indeed it seems incompatible with the new Configuration classes/etc.)
** Further Update:
It looks like my pom.xml
didn't refresh properly after the change from bigtable-hbase-dataflow
to bigtable-hbase-beam
ArtifactID. Once the project got updated, I am able to import from the
com.google.cloud.bigtable.beam.*
branch, which seems to be working at least for the minimal test.
HOWEVER: It looks like now there are two different Mutation classes:
com.google.bigtable.v2.Mutation
and
org.apache.hadoop.hbase.client.Mutation
?
And in order to get everything to work together, it has to be specified properly which Mutation is used for which operation?
Is there a better way to do this?