3
votes

I read about few different azure services - Events hub capture, Azure data factory, events hub, and more. I am trying to find several ways using azure services to do:

  1. Write data to some "endpoint" or place from my application (preferably service of azure)

  2. The data would be batched and saved in files to BLOB

  3. Eventually, the format should be parquet in the BLOB files

My questions are:

  1. I read that events hub capture only saves files as AVRO. So I might also consider second pipeline of copy from original AVRO BLOB to destination parquet BLOB. Is there a service in AZURE that can listen to my BLOB, convert all files to parquet and save again (I'm not sure from the documentation if the data factory can do this)?

  2. What other alternatives would you consider (except Kafka that I know about) to save stream of data to batches of parquet in BLOB?

Thank you!

1

1 Answers

0
votes

For the least amount of effort you can look into a combination of an Event Hub as your endpoint and then connect Azure Stream Analytics to that. It can natively write parquet to blob: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-define-outputs#blob-storage-and-azure-data-lake-gen2