1
votes

I want to call a rest api and save the results as a csv or json file in Azure Data Lake Gen2. Based on what I have read Azure Functions is the way to go.

The webservice returns data like the following format:

"ID","ProductName","Company"
"1","Apples","Alfreds futterkiste"
"2","Oranges","Alfreds futterkiste"
"3","Bananas","Alfreds futterkiste"
"4","Salad","Alfreds futterkiste"
 ...next rows

I have written a console app in C# which at the moment outputs the data to a console. The webservice uses pagination and returns 1000 rows (determined by the &num-parameter with a max of 1000). After the first request i can use the &next-parameter to fetch the next 1000 rows based on ID. For instance the url

http://testWebservice123.com/Example.csv?auth=abc&number=1000&next=1000

will get me rows from ID 1001 to 2000. (the call of the API and the pagination in reality is a bit more complex and thus I cannot use for instance Azure Data Factory_v2 to do the load to Azure Data Lake - this is why I think i need Azure Functions - unless I have overlooked another servic??. So the following is just a demo to learn how to write to Azure Data Lake.)

I have the following C#:

static void Main(string[] args)
    {


        string startUrl = "http://testWebservice123.com/Example.csv?auth=abc&number=1000";
        string url = "";
        string deltaRequestParameter = "";
        string lastLine;
        int numberOfLines = 0;

        do
        {
            url = startUrl + deltaRequestParameter;
            WebClient myWebClient = new WebClient();

            using (Stream myStream = myWebClient.OpenRead(url))
            {

                using (StreamReader sr = new StreamReader(myStream))
                {
                    numberOfLines = 0;
                    while (!sr.EndOfStream)
                    {
                        var row = sr.ReadLine();
                        var values = row.Split(',');

                        //do whatever with the rows by now - i.e. write to console
                        Console.WriteLine(values[0] + " " + values[1]); 

                        lastLine = values[0].Replace("\"", ""); //last line in the loop - get the last ID.
                        numberOfLines++;
                        deltaRequestParameter = "&next=" + lastLine;
                    }

                }

            }
        } while (numberOfLines == 1001); //since the header is returned each time the number of rows will be 1001 until we get to the last request


    }

I want to write the data to a csv-file to the data-lake in the most effective way. How would I rewrite the above code to work in Azure Function and save to a csv in Azure data lake gen2?

1

1 Answers

4
votes

Here are the steps which you need to do for achieving the result:

1) Create an azure function and trigger you can keep it HTTPTrigger/TimerTrigger, or as per your need.

2) I am assuming you have the code to call api in loop until it gives you desired result.

3) Once you have the Data in memory , you have to write following code to write it in Azure data lake.

Prerequisite for accessing ADLS using your c# code:

1) Register an app in Azure AD

enter image description here

enter image description here

Grant permission in data lake store

enter image description here

enter image description here

enter image description here

enter image description here

Below is the code for creating ADLS client.

// ADLS connection 
                var adlCreds = GetCreds_SPI_SecretKey(tenantId, ADL_TOKEN_AUDIENCE, serviceAppIDADLS, servicePrincipalSecretADLS);
                var adlsClient = AdlsClient.CreateClient(adlsName, adlCreds);



private static ServiceClientCredentials GetCreds_SPI_SecretKey(string tenant,Uri tokenAudience,string clientId,string secretKey)
        {
            SynchronizationContext.SetSynchronizationContext(new SynchronizationContext());
            var serviceSettings = ActiveDirectoryServiceSettings.Azure;
            serviceSettings.TokenAudience = tokenAudience;
            var creds = ApplicationTokenProvider.LoginSilentAsync(tenant,clientId,secretKey,serviceSettings).GetAwaiter().GetResult();
            return creds;
        }

Finally write the implementation to save the file in Azure data lake

 const string delim = ",";
        static string adlsInputPath = ConfigurationManager.AppSettings.Get("AdlsInputPath");

public static void ProcessUserProfile(this SampleProfile, AdlsClient adlsClient, string fileNameExtension = "")
        {
            using (MemoryStream memStreamProfile = new MemoryStream())
            {
                using (TextWriter textWriter = new StreamWriter(memStreamProfile))
                {
                    string profile;
                    string header = Helper.GetHeader(delim, Entities.FBEnitities.Profile);
                    string fileName = adlsInputPath + fileNameExtension + "/profile.csv";
                    adlsClient.DataLakeFileHandler(textWriter, header, fileName);
                    profile = socialProfile.UserID                                                
                                    + delim + socialProfile.Profile.First_Name
                                    + delim + socialProfile.Profile.Last_Name
                                    + delim + socialProfile.Profile.Name
                                    + delim + socialProfile.Profile.Age_Range_Min
                                    + delim + socialProfile.Profile.Age_Range_Max
                                    + delim + socialProfile.Profile.Birthday
                                   ;

                    textWriter.WriteLine(profile);
                    textWriter.Flush();
                    memStreamProfile.Flush();
                    adlsClient.DataLakeUpdateHandler(fileName, memStreamProfile);
                }
            }
        }

Hope it helps.