0
votes

We aim to collect data from the Azure Management APIs. These APIs provide information on the resources we have running in Azure, the consumed budget, etc (example). Following our design choices, we prefer to exclusively use Azure Data Factory to make the HTTP requests and store the data into our data lakes. This is fairly obvious, using the REST linked service. However, we struggle to correctly set up the OAuth2 authentication dance with this method.

  1. Our first idea was to store the token and the refresh token within the Azure Key Vault. A series of HTTP requests within the pipeline would then test whether the token is still valid or otherwise use the refresh token to get a new token. The downside to this approach is that the token within the Azure Key Vault is never updated, when needed, and that the logic becomes more complex.

  2. Alternatively, we were trying to set up the authorization through combination of a registered app and service principal to our Azure AD account. The REST linked service within Data Factory can be created with a service principal, which would then handle most of the information of the scope and consent. The service principal is also accompanied with a Azure app, which would hold the token etc. Unfortunately, we are unable to make this setup function correctly.

Questions we have:

  • Can we actually use a service principal / app to store our OAuth2 tokens? If so, will these be automatically refreshed within our app?

  • How do we assign the correct privileges / authorizations to our app that it can use this (external) API?

  • Is the additional logic with HTTP calls within Azure Data Factory pipeline needed to update the tokens or can these apps / service principals handle this?

Thank you for your time and help!

View of the REST linked service within Azure Data Factory

View of a registered app within Azure

1

1 Answers

1
votes

It is not a good idea to store the tokens in the keyvault, because they will expire.

In your case, two options for you to use.

  1. Use service principal to auth

  2. Use managed identity to auth(best practice)

Steps to use service principal to auth:

1.Register an application with Azure AD and create a service principal.

2.Get values for signing in and create a new application secret.

3.To call the Azure REST API e.g. Resources - List you mentioned, your service principal needs the RBAC role in your subscription.

Navigate to the Azure portal -> Subscription -> add your service principal as a Contributor/Owner role in the subscription like below.

enter image description here

4.In the linked service, configure it like below, fix them with the values got from step 2.

enter image description here

Don't forget to replace the {subscriptionId} in the Base URL.

https://management.azure.com/subscriptions/{subscriptionId}/resources?api-version=2020-06-01

5.Test the linked service with a copy activity, it works fine.

enter image description here

Steps to use managed identity to auth:

1.Make sure your data factory has enabled the MSI(managed identity), if you create it in the portal or powershell, MSI will be enabled automatically, don't worry about that.

2.Navigate to the Subsctiption in the portal, add the role to the MSI like step 3 in Steps to use service principal to auth, just search for your ADF name in the bar, the MSI is essentially a service principal with the same name of your ADF, which is managed by azure.

3.Then in the linked service, just change it like below.

enter image description here


At last, answer your questions.

Can we actually use a service principal / app to store our OAuth2 tokens? If so, will these be automatically refreshed within our app?

As I mentioned, it is not a good idea, just use the service principal/MSI to auth like the steps above.

How do we assign the correct privileges / authorizations to our app that it can use this (external) API?

To use the Azure REST API, just assign the RBAC roles like above, specify the correct AAD resource e.g. https://management.azure.com in this case.

Is the additional logic with HTTP calls within Azure Data Factory pipeline needed to update the tokens or can these apps / service principals handle this?

No need to do other steps, when you use the configuration above, essentially it will use the client credential flow to get the token in the background for you automatically, then use the token to call the API.