0
votes

I have a problem which I haven't found an adequate solution. I have a row key such as {project}#{location}#raw#{timestamp}

I would like to find, the row with the latest timestamp for a given prefix. Example: I want to find the row with the latest timestamp with project and location specified. Project1#Location1#raw#{??}

Is there any way to do that?

I guess the naive way would be to query for a long range of time, and then sort it out in python to find the latest timestamp. But I feel that is rather wasteful

2

2 Answers

0
votes

Since the timestamp is embedded in the row key itself, you will have to use a regex like the one you mentioned in the question: Project1#Location1#raw#{??} to filter the records. For sorting, as you can see in this documentation:

When Cloud Bigtable stores rows, it sorts them by row key in lexicographic order

So you don't have to sort it at all, just get the last position of the results of the query and it will be the record you want.

You mentioned you are considering using Python, in that case, you can check this example in the documentation for row key regex on how to get the data you want, after that all you have to do is print the last position of rows in that example. In order to that, as discussed in the comments you can do the following code:

rows.consume_all()
data = rows.rows
print(data) 
print(list(data)[-1])

Also, as discussed in the comments, if performance is an issue for you, consider using row prefixes instead of a filter on your seach as described here. The documentation says that Reads that use filters are slower than reads without filters and Restrict the rowset as much as possible is the first step on improving performance, so this might be a better approach than the one I suggested before.

0
votes

As an alternate approach, consider creating a side-table to index the timestamps. something like {project}#{location}#{timestamp}. this will allow you to easily find the latest timestamp per project and location, at the cost of having to maintain 2 tables (2 writes, additional data etc).