I need to save error logs in Google Bigtable. Every log belongs to a project which represents any kind of service. The following query needs to be supported.
Get all unique errors between a given start and end date with the number of times they occurred. The order should be descending, so that the latest errors come first. For example, there are three errors. Two of them are similar, they just occurred at a different time. The corresponding query should return two errors, whereas one of them has a count of 2 to indicate that his error occurred twice.
Each error has a fingerprint, which is a hash of the error message and stack trace.
Alternatives:
I'm now wondering how to design the row key to perform the query mentioned above.
a) Using project ID and fingerprint.
reversed_projectID#fingerprint
If a new error occurs, it will check if a similar error already exists and increments its count if appropriate. Otherwise, it creates a new row with a count of 1. On the other hand, I don't see how the descending order could be provided.
b) Storing each error in a separate row.
reversed_projectID#reversed_timestamp
This would solve the problem with the descending order but doesn't aggregate similar errors. If someone, for example, wants to see all unique errors for a given time range and the range contains, for instance, four unique errors but each error occurred 50.000 times, I would need to perform a row scan over 200.000 rows and aggregate them on the server.
Does anyone have an idea for a possible solution? Thanks a lot.