lately I've been playing around with Stream Analytics queries with PowerBI as output sink. I made a simple query which retrieves the total count of http responsecodes of our website requests over time and groups them by date and response code. The input data is retrieved from a storage account which holds BLOB storage. This is my query:
SELECT
DATETIMEFROMPARTS(DATEPART(year,R.context.data.eventTime), DATEPART(month,R.context.data.eventTime),DATEPART(day,R.context.data.eventTime),0,0,0,0) as datum,
request.ArrayValue.responseCode,
count(request.ArrayValue.responseCode)
INTO
[requests-httpresponsecode]
FROM
[cvweu-internet-pr-sa-requests] R TIMESTAMP BY R.context.data.eventTime
OUTER APPLY GetArrayElements(R.request) as request
GROUP BY DATETIMEFROMPARTS(DATEPART(year,R.context.data.eventTime), DATEPART(month,R.context.data.eventTime),DATEPART(day,R.context.data.eventTime),0,0,0,0), request.ArrayValue.responseCode, System.TimeStamp
Since continuous export became active on 3 september 2018, I chose a job start time of 3 september 2018. Since I am interested in the statistics until today, I did not include a date interval so I am expecting to see data from 3 september 2018 until now (20 december 2018). The job is running fine without errors and I chose PowerBI as an output sink. Immediately I saw the chart being propagated starting from 3 september grouped by day and counting. So far, so good. A few days later I noticed the output dataset didnt start from 3 september anymore but from 2 December until now. Apparently data is being overwritten.
The following link says:
https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-power-bi-dashboard
"defaultRetentionPolicy: BasicFIFO: Data is FIFO, with a maximum of 200,000 rows."
But my output table does not have close to 200.000 rows:
datum,count,responsecode
2018-12-02 00:00:00,332348,527387
2018-12-03 00:00:00,3178250,3282791
2018-12-04 00:00:00,3170981,4236046
2018-12-05 00:00:00,2943513,3911390
2018-12-06 00:00:00,2966448,3914963
2018-12-07 00:00:00,2825741,3999027
2018-12-08 00:00:00,1621555,3353481
2018-12-09 00:00:00,2278784,3706966
2018-12-10 00:00:00,3160370,3911582
2018-12-11 00:00:00,3806272,3681742
2018-12-12 00:00:00,4402169,3751960
2018-12-13 00:00:00,2924212,3733805
2018-12-14 00:00:00,2815931,3618851
2018-12-15 00:00:00,1954330,3240276
2018-12-16 00:00:00,2327456,3375378
2018-12-17 00:00:00,3321780,3794147
2018-12-18 00:00:00,3229474,4335080
2018-12-19 00:00:00,3329212,4269236
2018-12-20 00:00:00,651642,1195501
EDIT: I have created the STREAM input source according to https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-quick-create-portal. I can create a REFERENCE input as well, but this invalidates my query since APPLY and GROUP BY are not supported and I also think STREAM input is what I want according to https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-add-inputs.
What am I missing? Is it my query?