0
votes

I using Google Datastore to store multiple objects. Millions. At some point, I no longer want to keep storing rows on the database. The criterion to delete - Delete all the rows that older from 10 days.

I saw that Google provide two options to make this job:

  1. Send delete command in batch. Of cause that you should GET all the ids before. It sounds like a very slow idea when you have to remove millions of rows. It's also expensive.
  2. Use Google Dataflow product and provide an option to Delete bulk from Datastore. The problem here is just the price - high price.

The problem of those two options above is the pricing. I calculated that the price of deleting 16M rows in a month will cost 480$ (datastore read operations + delete operations) - which is too much money for small tasks. Additional to this you have to add the dataflow operations costs.

It seems that there is no cheap option to delete data from Datastore - I'm wrong?

1

1 Answers

2
votes

You don't have to read to delete. Deletes are based on keys. So, all you need is to identify keys. For this you can do keys only query which are much cheaper (just one operation for the entire projection, although there may be a limit on how many keys can be fetched at a time with a projection query).

Also, how did you compute $480? As per

https://cloud.google.com/datastore/pricing

for a Multi-region, it costs $0.06 for 100,000 reads and $0.02 for 100,000 deletes. Using these numbers, I get the following for 16M.

16*10^6 * ( (1/1000) * 0.06/10^5 + 0.02 / 10^5) = $3.2096

Here the 1/1000 factor is a single read operation for 1000 keys read using keysonly query.