Is there an example somewhere or can someone explain how to using Kinesis Analytics to construct real time sessions. (ie sessionization)
It is mentioned that this possible here: https://aws.amazon.com/blogs/aws/amazon-kinesis-analytics-process-streaming-data-in-real-time-with-sql/ in the discussion of custom windows but does not give an example.
Typically this is done in SQL using the LAG function so you can compute the time difference between consecutive rows. This post: https://blog.modeanalytics.com/finding-user-sessions-sql/ describes how to do it with conventional (non-streaming) SQL. However, I don't see support for the LAG function in Kinesis Analytics.
In particular I would love two examples. Assume that both take as input a stream consisting of a user_id and a timestamp. Define a session a sequence of events from the same user separated by less than 5 minutes
1) The first outputs a stream that has the additional columns event_count session_start_timestamp. Every time an event comes in this should output an event with these two additional columns.
2) The second example would be a stream that outputs a single event per session once the session has ended (ie 5 minutes have past with no data from a user). This event would have userId, start_timestamp, end_timestamp, and event_count
Is this possible with Kinesis Analytics?
Here is an example of doing this with Apache Spark: https://docs.cloud.databricks.com/docs/latest/databricks_guide/07%20Spark%20Streaming/Applications/01%20Sessionization.html
But I would love to do this with one (or two) Kinesis Analytics streams.