0
votes

I have two general questions to the Stream Analytics behavior. I found nothing or(for me) missleading information, in the documentation about my questions.

Both of my questions are targeting a Stream Analytics with EventHub as input source.

1. Stream position

When the analytics job started, are only events processed that are incoming after startup? Are older events which are still in the event hub pipeline ignored?

2. Long span time window

In the documentation is written

"The output of the window will be a single event based on the aggregate function used with a timestamp equal to the window end time."

If I created a select statement with a, for example, 7 days tumbling window. Is there any limitation of how many output elements the job can hold in memory before closing the window and send out the result set? I mean on my heavy workload eventhub that can be millions of output results.

1

1 Answers

1
votes

For your first question, there was not any evidence show that Stream Analytics will ignore any older events which before the job startup. Actually, the event lifecycle is depended on Event Hub Message Retention (1 ~ 7 days), not Stream Analytics. However, you can specify the eventStartTime & eventEndTime for a input to retrieve these data as you want, please see the first REST request properties of Stream Analytics Input.

enter image description here

On Azure portal, they are like as below.

enter image description here

For your second question, according to the Azure limits & quotas for Stream Analytics and the reference for Windowing, there is not any limits written for memory usage, the only limits are as below.

  1. For windowing, "The maximum size of the window in all cases is 7 days."
  2. For Stream Analytis, "Maximum throughput of a Streaming Unit" is 1MB/s.
  3. For Event Hubs, as below. enter image description here

These above will cause the output delay.