1
votes

I see that Bigtable is designed to only have 1 primary index, the row key. However I now realize I want to scan Bigtable by a time range in which a row is added. How should I implement this? Is it possible or should I consider something else like Cloud Datastore?

Currently my row key is just a hash, like this: ABC1234567. Can I add a timestamp to it (e.g. ABC1234567#20180524) then use regex filter on row key to scan by time range? This scan doesn't have to be very fast. But then when I look for a hash (which needs to be fast), I need to use prefix scan instead of exact match. I'm not sure of the impact to performance this may incur.

Another solution is maybe to add another table to store timestamp info.

One more thing to consider is that I'm using the Node.js client which is in alpha right now, so it may not have all the tools necessary.

2

2 Answers

3
votes

We have a lot of documentation here regarding correct time series schema design: https://cloud.google.com/bigtable/docs/schema-design-time-series.

In your specific case, appending the timestamp to the end of the hash would work. If you want to search for rows in March 2018 across all hashes, then you would need to do a full scan of the entire table every time you use a regex filter.

When you need to look up by hash, that will still be fast, as you would search by prefix and bigtable will be able to find it efficiently since it relies on lexicographical sorting.

The last option you alluded to, having a separate table, is known as denormalization, and is an option if you need both types of queries to be fast.

1
votes

I don't know if this question could be flagged as duplicated but I think this other question-answer might answer yours:

Row timestamps in Bigtable - when are they updated?

Also, you can check about the TimestampRange here. and about the composition of the key here:

Each row is essentially a collection of key/value entries, where the key is a combination of the column family, column qualifier and timestamp. If a row does not include a value for a specific key, the key/value entry is simply not present.