2
votes

I can't find much doco on this topic, but i'm serialising an object to Avro and then sending to Azure EventHub.. I think the Avro needs to contain schema too.. because without, how will a consumer (I.e. Azure Stream Analytics for example) know how to deserialise?

The only example i can locate online uses Microsoft.Hadoop.Avro.Container namespace.. this seems to work fine, i can read via Stream Analytics.. but does this code 'automagically' include the schema in the payload? I sure as can't see any reference to it here:

        using (var memoryStream = new MemoryStream())
        using (var writer = AvroContainer.CreateWriter<T>(memoryStream, Codec.Null))
        using (var seqWriter = new SequentialWriter<T>(writer, items.Count()))
        {
            foreach (var e in items)
            {
                seqWriter.Write(e);
            }

            return memoryStream.ToArray();
        }

The landscape of Avro in .Net seems a bit confused, why is there a Microsoft specific NuGet pkg? It seems quite old, has it now been superseded by something? Is there any documentation on how to leverage the standard Apache.Avro NuGet pkg to build a payload that contains the schema?

Azure Event Hub documentation fleetingly mentioned Avro, but any google search only really turns out Event Hub Capture..

Anyway in short.. is there a better way? I don't think i can send the schemas separately for this..

1
Can you please share as to which Microsoft document you are referring too ?HimanshuSinha-msft
well, this really.. docs.microsoft.com/en-us/azure/event-hubs/… like i say, it's mostly related to the EventHub Capture aspect.. can't see anything obvious about Avro serialised payloads.. The Stream Analytics doco is also a bit light!m1nkeh
My assumption from recent experience is that when you are writing to AVRO in eventhubs the data is embedded with the AVRO schema and you don't need explicit schema definition . However I found it weird that Eventhubs doesn't show the ingested data in AVRO format and can't deserialize to show it properly in Azure portal. However my consumer can parse the data and show it properlypauldx

1 Answers

4
votes

First, Azure Stream Analytics support processing events in Avro data formats, you can see it in the offical document Parse JSON and Avro data in Azure Stream Analytics, as the figure below.

enter image description here

Even assumed that Azure Stream Analytics can not deserialize an event of Avro format as you wish, you also can write a custom .NET deserializer to make it work for you, as the offical documents below said.

  1. Tutorial: Custom .NET deserializers for Azure Stream Analytics
  2. Use .NET deserializers for Azure Stream Analytics jobs

Meanwhile, I don't think Microsoft.Hadoop.Avro2 is a suitable library for Avro in your scenario. Except it, there are other choices

  1. Apache.Avro Its API reference page is https://avro.apache.org/docs/current/api/csharp/html/namespaces.html, and you need to refer to the sample codes for Java or Python to write your C# code.
  2. Microsoft.Avro.Core and its GitHub rep dougmsft/microsoft-avro with some test code which can be refered to.