0
votes

I have a stream of user data being ingested by [Azure Event Hubs] (https://docs.microsoft.com/en-us/azure/event-hubs/).

I am aggregating that data using [Azure Stream Analytics] (https://docs.microsoft.com/en-us/azure/stream-analytics/) by the COUNT(*) of views for a given productId.

I then update the productId's view count in CosmosDB.

In a prototype, I ran a query in a Stream Analytics job like this:

SELECT
    productId, COUNT(*) AS views
INTO
    cosmosDbOutput
FROM
    eventHubInput
TIMESTAMP BY moment
GROUP BY productId, System.Timestamp()

This query upserted the views of a product for all time.

I ran a query in Cosmos to retrieve the most popular views:

SELECT TOP 10 * FROM c
ORDER BY c.views DESC

What I'd like to do is only maintain the views that a product has seen in the last 30 days.

I see that Stream Analytics jobs have [windowing functions] (https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions) that have a max time of 7 days.

Given that limit on windowing functions, is there a way to write a query that aggregates over 7+ days?

If not, is there a different approach in terms of tooling or storage I should consider?

1

1 Answers

1
votes

I assume your past data won`t change, so there's no need to use Stream Analytics for it. Once you calculate the count, you can persist it on a SQL Database or Table Services (Cosmos / Storage) and create a materialized view.

enter image description here

more info about materialized view: https://docs.microsoft.com/en-us/azure/architecture/patterns/materialized-view