3
votes

I'm trying to use WebHDFS with Azure Data Lake. According to Microsoft's documentation, the steps one should follow are:

  • Create a new application in Azure AD with a key and delegated permissions to Azure Management Services
  • Using the client_id, tenant_id, and secret key, make a request to the OAUTH2 endpoint

    curl -X POST https://login.microsoftonline.com/<TENANT-ID>/oauth2/token \ -F grant_type=client_credentials \ -F resource=https://management.core.windows.net/ \ -F client_id=<CLIENT-ID> \ -F client_secret=<AUTH-KEY>

Upon success, you then get back some JSON including an "access_token" object, which content you should include with subsequent WebHDFS requests by adding the header

Authorization: Bearer <content of "access_token">

where <content of "access_token"> is the long string in "access_token" object.

Once you have included that header, you should be able to make WebHDFS calls, such as to list directories, you could do

curl -i -X GET -H "Authorization: Bearer <REDACTED>" https://<yourstorename>.azuredatalakestore.net/webhdfs/v1/?op=LISTSTATUS

Having followed all those steps, I am getting an HTTP 401 error when running the above curl command to list directories:

WWW-Authenticate: Bearer authorization_uri="https://login.windows.net/<REDACTED>/", error="invalid_token", error_description="The access token is invalid."

with the body

{"error":{"code":"AuthenticationFailed","message":"Failed to validate the access token in the 'Authorization' header."}}

Does anyone know what might be the problem?

I pasted the token into jwt.io and it is valid (didn't check the signature). The content is something like this:

    {
 typ: "JWT",
 alg: "RS256",
 x5t: "MnC_VZcATfM5pOYiJHMba9goEKY",
 kid: "MnC_VZcATfM5pOYiJHMba9goEKY"
}.
{
 aud: "https://management.core.windows.net",
 iss: "https://sts.windows.net/<TENANT-ID>/",
 iat: 1460908119,
 nbf: 1460908119,
 exp: 1460912019,
 appid: "<APP-ID>",
 appidacr: "1",
 idp: "https://sts.windows.net/<TENANT-ID>/",
 oid: "34xxxxxx-xxxx-xxxx-xxxx-5460xxxxxxd7",
 sub: "34xxxxxx-xxxx-xxxx-xxxx-5460xxxxxxd7",
 tid: "<TENANT-ID>",
 ver: "1.0"
}.
1
Can you provide a screenshot of where you have granted permissions to your AAD application to access Azure Data Lake Store?GregGalloway
Sure, here.MikeBrno
I have the same issue, weird...Andrei Varanovich
Do you have authority to delegate permissions in Azure Active Directory? Even though I was admin for some reason I did not have this authority and hence my application could not connect via WebHDFS. Authentication/Authorization in Azure is always needlessly painful, filled with unhelpful error messages and weird undocumented behavior.MikeBrno

1 Answers

1
votes

Please click the Data Explorer button then highlight the root folder and click Access. Then grant your AAD app permissions to WebHDFS there. I believe what you have done already is just to grant that AAD app permissions to manage your Azure Data Lake Store with the portal or Azure PowerShell. You haven't actually granted WebHDFS permissions yet. Further reading on security is here.