2
votes

I created a Pipeline in Azure Data Factory V2. It will copy the data from Rest API and save this data in the form of JSON file in Azure Data Lake. Then I transform that JSON file using U-SQL and Copy that data into another folder in .csv format. My Pipeline. See the following Image of Pipeline.

enter image description here

The Rest API Return Data into JSON Format lie.

{
    "data": [
        {
            "id": "100024937598765",
            "name": "Sebastian Martinelli",
            "email": "sebastian.martinelli@abc.com",
            "administrator": false
        },
        {
            "id": "100024909012916",
            "name": "Diego Juarez",
            "email": "diego.juarez@abc.com",
            "administrator": false
        },
        {
            "id": "100025002270557",
            "name": "Jose Lopez",
            "email": "jose.lopez@abc.com",
            "administrator": false
        },
        {
            "id": "100024553664067",
            "name": "Valentin Montemarani",
            "email": "valentin.montemarani@abc.com",
            "administrator": false
        }
    ],
    "paging": {
        "cursors": {
            "before": "QVFIUmU1QnBOYThYTnJiQlNqVzItMFdoTVprSHh3cWZA4LXF2ZAE5nSjIxVWZAOWUc1ZAjdLZAjN2em1SazRYVno4TGE4aFBaOFdMaS1NMDdkeEduVkRsOTVhN3Jn",
            "after": "QVFIUjhWdm5EOTk3amJaWHVYR3p1OEZAZAQ0ZAoeTR5TDBwblE0Mmx3dC1zRXhPM2VLZAWdqR0RWQndUVnhpTGc3RkIzVkNIY21EcXFTQU93NHVxRFcxVW12dTNB"
        },
        "next": "https://graph.facebook.com/v2.12/1528385107457405/members?access_token=%2Cemail&limit=25&after=QVFIUjhWdm5EOTk3amJaWHVYR3p1OEZAZAQ0ZAoeTR5TDBwblE0Mmx3dC1zRXhPM2VLZAWdqR0RWQndUVnhpTGc3RkIzVkNIY21EcXFTQU93NHVxRFcxVW12dTNB"
    }
}

This API is not returning full data it returns data using next link by hitting that next link we can fetch next set of records. How can I fetch this type of data using Iteration in Data Factory V2? I tried other activities like For each, Until and If but unable to repeat that copy activity to fetch complete data.

The Base URL is like

https://graph.facebook.com/Community/groups?fields=privacy,name,purpose&limit=5&access_token=value

Now the next URL will be the same and have 1 more parameter after in it like

https://graph.facebook.com/v2.12/1528385107457405/groups?access_token=value&pretty=1&fields=privacy,name,purpose&limit=5&after=QVF

Is there any way to do this?

1
Hover over the next link. Does it look like a URL that you could reproduce? Usually a next URL is just has some parameters that tell it what page to return next.Nick.McDermaid
Yes it is a URL and will return next set of records. But how can we iterate this copy activity till there is next link available in response?Waqas Idrees
Yes I know the URL returns next set of records. But what does it look like? Does it have parameters that indicate the page etc.? You might be able to use the For Each activity, though I've never used it before: docs.microsoft.com/en-us/azure/data-factory/… Generally anything non trivial in data factory has to be compiled into a DLL and called. Please post a sample of the next URL (and the URL on the first page) Normally these are just URL's with parameters. If that's the case you have some optionsNick.McDermaid
You say you are using a REST API. An API normally has documentation on how to use it. If it doesn't, it's not a very good APINick.McDermaid
Since you don't know how many iterations this will take, then you'd need to use Until. You need to extract data out of JSON which you just returned, and I don't see any information on how to do that - I don't expect that it's possible. If this needs to be done in the cloud then I would approach this by writing some powershell to do the basics (call the API, write the JSON to a data lake, inspect the JSON to get the next URI, call the next URI, repeat) and run that in Azure Automation. Then you could call an ADF activity from there using a trigger or just do it all in PowershellNick.McDermaid

1 Answers

3
votes
  1. Copy activity has a built-in support for Pagination. In a case of @Waqas Idrees the pagination rule should be set as:

    • Name: AbsoluteUrl
    • Value: $.data.paging.next

    Such setting can be found on Source tab in a Copy activity:

enter image description here

  1. There is no need to use U-SQL to transform hierarchical to a tabular format. Copy activity Mapping can do it on a fly

Therefore answer to the question of OP:

How can I fetch this type of data using Iteration in Data Factory V2?

No need to use Iterations. Consider to use a Copy activity functionality.

P.S. Because exploration of such topic took quite considerable amount of my time and I was not able to find a similar guidance during my research, I've created a blog post about it and described the process step-by-step: Azure Data Factory and REST APIs - Mapping and Pagination

P.P.S. Because of pace of development of ADF, I accept that in 2018 April built in pagination could not be part of the product yet. In this case it explains why the question was not answered that time. However, I hope that my findings will help someone who faced similar challenge these days.