Delete Bigtable row in Apache Beam 2.2.0

Question

In Dataflow 1.x versions, we could use CloudBigtableIO.writeToTable(TABLE_ID) to create, update, and delete Bigtable rows. As long as a DoFn was configured to output a Mutation object, it could output either a Put or a Delete, and CloudBigtableIO.writeToTable() successfully created, updated, or deleted a row for the given RowID.

It seems that the new Beam 2.2.0 API uses BigtableIO.write() function, which works with KV<RowID, Iterable<Mutation>>, where the Iterable contains a set of row-level operations. I have found how to use that to work on Cell-level data, so it's OK to create new rows and create/delete columns, but how do we delete rows now, given an existing RowID?

Any help appreciated!

** Some further clarification:

From this document: https://cloud.google.com/bigtable/docs/dataflow-hbase I understand that changing the dependency ArtifactID from bigtable-hbase-dataflow to bigtable-hbase-beam should be compatible with Beam version 2.2.0 and the article suggests doing Bigtble writes (and hence Deletes) in the old way by using CloudBigtableIO.writeToTable(). However that requires imports from the com.google.cloud.bigtable.dataflow family of dependencies, which the Release Notes suggest is deprecated and shouldn't be used (and indeed it seems incompatible with the new Configuration classes/etc.)

** Further Update:

It looks like my pom.xml didn't refresh properly after the change from bigtable-hbase-dataflow to bigtable-hbase-beam ArtifactID. Once the project got updated, I am able to import from the com.google.cloud.bigtable.beam.* branch, which seems to be working at least for the minimal test.

HOWEVER: It looks like now there are two different Mutation classes: com.google.bigtable.v2.Mutation and org.apache.hadoop.hbase.client.Mutation ?

And in order to get everything to work together, it has to be specified properly which Mutation is used for which operation?

Is there a better way to do this?

Solomon Duskis Solomon Duskis · Accepted Answer · 2018-01-02T22:08:43

I would suggest that you should continue using CloudBigtableIO and bigtable-hbase-beam. It shouldn't be too different from CloudBigtableIO in bigtable-hbase-dataflow.

CloudBigtableIO uses the HBase org.apache.hadoop.hbase.client.Mutation and translates them into the Bigtable equivalent values under the covers

Delete Bigtable row in Apache Beam 2.2.0

2 Answers