0
votes

I'm using the capture function on an Event Hub to push data into a data lake. This is being saved into the lake in .avro files.

I would like to join this data to some existing data I have in in a SQL data warehouse. So the most sensible action seems for me to push the .avro files into SQL. How can I do this?

Frustratingly the data from the API is originally in JSON format, but is returned as .avro from the web app on azure. If I could get the files to be JSON then I would just use the OPENJSON functions in T-SQL.

Thank you

1
what have you tried so far? - LuckyLikey
@LuckyLikey I think I might be able to use U-SQL to manipulate the .avro files somehow, so I was going to start working through this "stairway to U-SQL" when i get home - sqlservercentral.com/stairway/142480 - Omar Rose
U-SQL could work. Data Factory could also work. If you could convert your files to ORC, Parquet, or a delimited text file, you could import them to SQL DW with Polybase. Or you could just run federated queries against them in the data lake if you didn't necessarily need to import everything (but test performance before you go this route). - mmarie

1 Answers

1
votes

I know im kind of late with this answer, but whenever I work with U-SQL I find it useful to check their github for examples to have a baseline so I can work on improving those to my needs.

Going to your question, have you checked this? https://github.com/Azure/usql/tree/master/Examples/AvroExamples It shows the basics of handling avro files.

Hope this helped!! Martin