0
votes

I have simulated devices which is sending messages to IoT Hub blob storage and from there I am copying data(encoded in JSON format) to Azure Data Lake Gen2 by creating a pipeline using Azure Data Factory.

How to convert these json output file to CSV file to be processed by data lake engine? Can't I process all the incoming json telemetry directly in azure data lake?

1
by data lake engine do you mean azure data lake analytics (u-sql scripts)?Peter Bons
@PeterBons By Azure Data Lake AnalyticsLalatendu Mohanty
@PeterBons Thanks for your kind answer. My recent requirement has been changed, it will be helpful if you can advise me some other ways to process the encoded sensor json data into csv using Azure function.Lalatendu Mohanty
How much data are we talking about? Good thing about azure data lake analytics is that it scales very well. You could probably create an event hub triggered azure function and then write the conversion from Json to csv. But you get an event or batch of events and you need to append the data to a file yourself if you want a single large csv file.Peter Bons

1 Answers

0
votes

There are 3 official built-in extractors that allows you to analyze data contained in CSV, TSV or Text files.

But MSFT also released some additional sample extractors on their Azure GitHub repo that deal with Xml, Json and Avro files. I have used the Json extractor in production as it is really stable and useful.

The JSON Extractor treats the entire input file as a single JSON document. If you have a JSON document per line, see the next section. The columns that you try to extract will be extracted from the document. In this case, I'm extracting out the _id and Revision properties. Note, it's possible that one of these is a further nested object, in which case you can use the JSON UDF's for subsequent processing.

REFERENCE ASSEMBLY [Newtonsoft.Json];
REFERENCE ASSEMBLY [Microsoft.Analytics.Samples.Formats]; 

//Define schema of file, must map all columns
 @myRecords =
    EXTRACT
        _id string,
    Revision string     
    FROM @"sampledata/json/{*}.json"
    USING new Microsoft.Analytics.Samples.Formats.Json.JsonExtractor();