1
votes

I have a stream analytics job that is consuming an Event Hub of avro messages (we'll call this RawEvents), transforming/flattening the messages and firing them into a separate Event Hub (we'll call this FormattedEvents).

Each EventData instance in RawEvents consists of a single top level json object that has an array of more detailed events. This is a contrived example:

[{ "Events": [{ "dataOne": 123.0, "dataTwo": 234.0, "subEventCode": 3, "dateTimeLocal": 1482170771, "dateTimeUTC": 1482192371 }, { "dataOne": 456.0, "dataTwo": 789.0, "subEventCode": 20, "dateTimeLocal": 1482170771, "dateTimeUTC": 1482192371 }], "messageType": "myDeviceType-Events", "deviceID": "myDevice", }]

The Stream Analytics job flattens the results and unpacks subEventCode, which is a bitmask. The results look something like this:

{"messagetype":"myDeviceType-Event","deviceid":"myDevice",eventid:1,"dataone":123,"datatwo":234,"subeventcode":6,"flag1":0,"flag2":1,"flag3":1,"flag4":0,"flag5":0,"flag6":0,"flag7":0,"flag8":0,"flag9":0,"flag10":0,"flag11":0,"flag12":0,"flag13":0,"flag14":0,"flag15":0,"flag16":0,"eventepochlocal":"2016-12-06T17:33:11.0000000Z","eventepochutc":"2016-12-06T23:33:11.0000000Z"} {"messagetype":"myDeviceType-Event","deviceid":"myDevice",eventid:2,"dataone":456,"datatwo":789,"subeventcode":8,"flag1":0,"flag2":0,"flag3":0,"flag4":1,"flag5":0,"flag6":0,"flag7":0,"flag8":0,"flag9":0,"flag10":0,"flag11":0,"flag12":0,"flag13":0,"flag14":0,"flag15":0,"flag16":0,"eventepochlocal":"2016-12-06T17:33:11.0000000Z","eventepochutc":"2016-12-06T23:33:11.0000000Z"}

I'm expecting to see two EventData instances when I pull messages from the FormattedEvents Event Hub. What I'm getting is a single EventData with both "flattened" events in the same message. This is expected behavior when targeting blob storage or Data Lake, but surprising when targeting an Event Hub. My expectation was for behavior similar to a Service Bus.

Is this expected behavior? Is there a configuration option to force the behavior if so?

2

2 Answers

0
votes

Yes, this is expected behavior currently. The intent was to improve throughput trying to send as many events in an EventHub Message(EventData).

Unfortunately, there is no config option to override this behavior as of today. One possible way that may be worth trying is to leverage the concept of output partition key to something super unique (i.e. add this column to your query -- GetMetadataPropertyValue(ehInput, "EventId") as outputpk ) . Now specify that "outputpk" as PartitionKey in your output EventHub's ASA settings.

Let me know if that helps.

cheers Chetan

0
votes

I faced the same problem. Thanks for the answers of manually formatting the input message. I solved it with my colleague with a few lines of code, which removed line feed and carriage return. Then I replaced "}{" by "},{" and made it an array by adding "[" and "]" to both ends.

string modifiedMessage = myEventHubMessage.Replace("\n","").Replace("\r","");    
modifiedMessage = "[" + modifiedMessage.Replace("}{","},{") + "]";

And then making the input as a list of objects according to its data structure:

List<TelemetryDataPoint> newDataPoints = new List<TelemetryDataPoint>();
try
{
    newDataPoints = Newtonsoft.Json.JsonConvert.DeserializeObject<List<TelemetryDataPoint>>(modifiedMessage);

.... ....