2
votes

Trying to connect to Azure Data Lake storage Gen2 using a token via Hadoop client in console and receive the error

ls: AADToken: HTTP connection failed for getting token from AzureAD. Http response: 400 Bad Request

If I put the token into "" the error are different

id_token:<TOKEN VALUE>: Unknown command
Usage: hadoop fs [generic options]

What was done:

  1. Created Storage account using https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-quickstart-create-account

  2. Created Azure AD application using https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal

  3. Granted admin consent to application from the 2nd step - https://i.imgur.com/myMtkeu.png

  4. Also granted admin consent to enterprise apps with name as the app from step 2 https://i.imgur.com/BPX48NE.png

Steps 3 and 4 were done as described here - https://docs.microsoft.com/en-us/azure/active-directory/manage-apps/configure-user-consent#grant-admin-consent-when-registering-an-app-in-the-azure-portal

  1. Then I generated an authorization code
https://login.microsoftonline.com/<TENANT ID>/oauth2/authorize?client_id=<CLIENT ID>&response_type=code&redirect_uri=https%3A%2F%2Flocalhost%2Fmyapp%2F&response_mode=query&resource=https://datalake.azure.net/&state=12345
  1. and got the token
curl -X POST https://login.microsoftonline.com/<TENANT ID>/oauth2/token \
 -F redirect_uri=https://localhost/myapp/ \
 -F grant_type=authorization_code \
 -F resource=https://datalake.azure.net \
 -F client_id=<CLIENT ID> \
 -F client_secret=<CLIENT SECRET> \
 -F code=OAQABAAIAAAAP0wLlqdLVToOpA4kwzSnxLhHJrARX8557... (Authorization code)

According to the Apache documentation I created a console command

hadoop fs -Dfs.azure.ssl.channel.mode=Default_JSSE
-Dfs.azure.account.auth.type=OAuth
-Dfs.azure.account.oauth.provider.type=org.apache.hadoop.fs.azurebfs.oauth2.RefreshTokenBasedTokenProvider
-Dfs.azure.account.oauth2.client.id=<CLIENT ID>
-Dfs.azure.account.oauth2.refresh.token=<TOKEN>
-ls abfss://<CONTAINER NAME>@<STORAGE ACCOUNT>.dfs.core.windows.net/

So the command below should display a list of the folders and files in the container. Something wrong with the command or with a container configuration in Azure? Please advice.

1

1 Answers

0
votes

Are you using the community edition of databricks? I encountered the same issue cause I ran out of space or exceeded the number of files on dbfs so I figured I'd just mount my own storage account on azure and encountered the same issue you did.

I spun up my own instance of databricks and was able to successfully mount my own storage but running the same code on the community edition resulted in an error.