We see an issue that on stream analytics when using a blob reference input. Upon restarting the stream, it prints double values for things joined to it. I assume this is an issue with having more than 1 blob active during the time it restarts. Currently we pull the files from a folder path in ADLS structured as Output/{date}/{time}/Output.json, which ends up being Output/2021/04/16/01/25/Output.json. These files have a key that the data matches on in the stream with:
IoTData
LEFT JOIN kauiotblobref kio
ON kio.ParentID = IoTData.ConnectionString
which I don't see any issue with, but those files are actually getting created every minute on the minute by an azure function. So it may be possible during the start of stream analytics, it grabs the last and the one that gets created following. (That would be my guess, but I'm not sure how we would fix that).
Here's a visual in powerBI of the issue:
This is easily explained when looking at the cosmosDB for that device it's capturing from, there are two entries with the same value, assetID, timestamp, different recordID(just means cosmosDB counted it as two separate events). This shouldn't be possible because we can't send duplicates with the same timestamp from a device.