I am working on a project wherein I have to store events related to user activity per user on a daily basis for later analysis. I will be getting stream of timestamped events and later on will run dataflow jobs on this data for analytics to get stats per user. I am exploring big table to store this data, wherein timestamp will act as a key for each row, later I will run a range query to get single day data and process it. But after going through couple of resources figured that with timestamped row keys , big table can get into hotspotting problem. Can't promote userid as a key in row key to avoid this. Any alternative approach to solve this or any other storage engine that can help in this use case.
use case: The use case is that I have user Activity data like impression and clicks in streams. Based on rules I have to aggregate data from these streams for a certain duration, store it and serve it asap to upstream service. Data will be processed in a tumbling window fashion as of know 24 hr but it may increase or decrease. The choice I have to make is, how to store raw events(Bigtable or big query or direct analysis on streams), compute engine(beam vs aggregation queries) and final storage(based on user id). Relation b/w user and aggregated data is one to many.