7
votes

I'm using a service which outputs to an Event Hub.

We want to store that output, to be read once per day by a batch job running on Apache Spark. Basically we figured, just get all messages dumped to blobs.

What's the easiest way to capture messages from an Event Hub to Blob Storage?

Our first thought was a Streaming Analytics job, but it demands to parse the raw message (CSV/JSON/Avro), our current format is none of those.


Update We solved this problem by changing our message format. I'd still like to know if there's any low-impact way to store messages to blobs. Did EventHub have a solution for this before Streaming Analytics arrived?

4
If your Event Hub serialization format isn't CSV/JSON/Avro then what is it?GregGalloway
@GregGalloway - In fact it's JSON, but prefixed with a C# interface name. Our C# code sniffs that to know what type to deserialize it into.Iain
Have you seen this link? I don't have all the answers on how to automate this to run daily or the best way to parse JSON in Spark but this seems like a good starting point for research and maybe others can comment: azure.microsoft.com/en-us/documentation/articles/…GregGalloway
(Sorry keep hitting Enter accidentally :) Cheers. I think we need a long-term record of all the data coming in. ... We could have Spark Streaming receiving it and immediately writing it out. But it seems overkill even more than the Streaming Analytics version already is.Iain

4 Answers

3
votes

You can use event-hubs-capture to capture to a blob.

5
votes

You could write your own worker process to read the messages off EventHub and store them to blob storage. You do not need to do this real time as messages on EH remain for the set retention days. The client that reads the EH is responsible for managing what messages have been processed by keeping track of the EH message partitionid and offset. There is a C# library that makes this extremely easy and scales really well: https://azure.microsoft.com/en-us/documentation/articles/event-hubs-csharp-ephcs-getstarted/

1
votes

Azure now has this built-in: Event Hubs Archive (in preview)

1
votes

You can also do this via an Azure Function (serverless code) which fires from an Event Hub trigger.

Depending on your requirement, this can work better than the Event Capture feature if you need a capability that it doesn't have, like saving as GZIP, or writing to a more custom blob virtual directory structure.

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-event-hubs#trigger-usage