1
votes

Is there a way in Azure Stream Analytics to create some aggregation with a custom state like Sparks mapWithState does?

Here is my scenario:

I have data from IoT devices containing the following fields:

  • DeviceId
  • Position
  • Value

The data may arrive out of order.

Whenever a new packet arrives for a given DeviceId, I want to output the last n positions and values for that device. Like

Input: { "DeviceId": "A", "Position": 10, "Value": 100}

Output: { "DeviceId": "A", "Positions": [10], "Value": [100]}


Next Input: { "DeviceId": "A", "Position": 11, "Value": 101}

Output: { "DeviceId": "A", "Positions": [10, 11], "Value": [100, 101]}


Next Input: { "DeviceId": "A", "Position": 9, "Value": 99}

Output: { "DeviceId": "A", "Positions": [9, 10, 11], "Value": [9, 100, 101]}

In Spark Structured Streaming I would implement this using groupBy and mapWithState. Is there a way to implement this in ASA?

1

1 Answers

1
votes

in ASA, you can use one of the following methods to do this:

  • if you have an additional column that can be use for TIMESTAMP, you can use TIMESTAMP BY and ASA will reorder the events. Then you can use LAG to fetch latest events for this particular device.
  • without any timestamp column, you can create COLLECTTOP operator, and order the events according to your "Position" column
  • alternatively, you can implement your own stateful logic using User Defined Aggregates (UDA) as described here.

Let me know if you need help to implement one of these 3 methods. I'll be happy to provide further details.

Thanks,

JS