1
votes

We are evaluating Google Bigtable as a hot storage for IoT data. we have a RowKey based on DeviceID + Timestamp e.g. 'ABC20201122093211' And the row data stored is a protobuf message.

We are also moving this data into a cold storage after some days as the amount of data is just gigantic.

Now, my issue is, what is the proper way to delete the data from Bigtable?

If I use a TTL, the data may still hang around for a week before compaction of the table. Admin Client deletes seems to be able to do it faster, but instead there is no way to delete multiple ranges at once. I would have to sequentially delete the timerange for each device.

What are my options here?

1

1 Answers

0
votes

There are 2 things here TTL with garbage collecting and the Admin Client library. TTL will mark your data for removal after a time you specify and the Admin SDK will send a delete request for some data that you specify. they are the same in some way as they will only mark the data for removal and just that. The data will be still there even if it's marked for removal and this may take up to 1 week for both of them until the compaction and garbage collecting occurs.

As I said they are the same from the removal point of view as you will still need to wait for up to 1 week before the actual deletion occurs. some key difference are:

1- data marked for removal with the Admin Client will not show in the read requests for your data

2- garbage collected data with TTL will show and you may need to use filters to exclude these data (that's what I think that gave you the impression that the data removed faster with the Admin Client library)

To be honest, from my point of view it seems like the TTL uses a Declarative approach and the Admin Client library like an imperative one. aside from that you will still pay for the data even if it's marked for removal until the actual compaction occurs which again may take up to 1 week.

you can read more about this here: When data is removed