1
votes

I need to add a reference to an Azure Data Lake to an existing cluster programmatically. I know that we can do this with blob storage via a script action, but I've found no documented way to do this with ADL.

I've looked at the script in detail that is used to add blob storage (https://hdiconfigactions.blob.core.windows.net/linuxaddstorageaccountv01/add-storage-account-v01.sh), and understand the manipulations it makes to the core-site.xml. But I can't figure out how to do something similar with ADL. In particular I'm looking at the core-site.xml file and see that the fs.azure.datalake.token.provider.script refers to the same decrypt script as the blob storage token provider. However, I don't see an encrypted value for the data lake token anywhere.

4

4 Answers

0
votes

I am not sure if this is directly supported but here are a few articles you may reference. If there is no documentation on it and the only way to do it are strange workarounds I would wait until it's released as a full feature, if ever. I'm sure this feature request has been proposed multiple times!

https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-hdinsight-hadoop-use-powershell

In that article it mentions,

If you are going to use Data Lake Storage Gen1 as additional storage for HDInsight cluster, we strongly recommend that you do this while you create the cluster as described in this article. Adding Data Lake Storage Gen1 as additional storage to an existing HDInsight cluster is a complicated process and prone to errors.

Not sure if this cmdlet can be used to add to an exisiting cluster,

https://docs.microsoft.com/en-us/powershell/module/azurerm.hdinsight/Add-AzureRmHDInsightClusterIdentity?view=azurermps-6.13.0&viewFallbackFrom=azurermps-3.8.0

0
votes

You can use this guide: Add additional storage accounts to HDInsight. It worked well for me (I followed the instructions in PowerShell).

Be warned though that the newly added storage accounts will never appear in the cluster's Storage Accounts blade in Azure.

0
votes

There is no similar guide for ADLv1/v2 as it is for adding storage account to HDInsight posted above.

But, good news is that the bash script provided can be reused. Bash script does a bunch of steps, and you only need the last two.

Add the missing custom properties to be added to core-site.xml file under updateAmbariConfigs.

Here are the custom properties needed to use ADLv2 storage as an additonal storage store for your cluster:

fs.azure.account.auth.type.#yourADLv2storagename#.dfs.core.windows.net=OAuth
fs.azure.account.oauth.provider.type.#yourADLv2storagename#.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider
fs.azure.account.oauth2.client.endpoint.#yourADLv2storagename#.dfs.core.windows.net=https://login.microsoftonline.com/#yourtenantID#/oauth2/token
fs.azure.account.oauth2.client.id.#yourADLv2storagename#.dfs.core.windows.net=#yourApplicationRegistrationIDUsedForADLaccess#
fs.azure.account.oauth2.client.secret.#yourADLv2storagename#.dfs.core.windows.net=#clientSecretForAppRegistrationAbove#

You then can store the script in a publicly accessible container in storage account and then submit the script via GUI or any other preferred way.

enter image description here

0
votes

You can only access additional ADLS Gen2 storage if the primary storage account was ADLS Gen2. Typically you set you primary storage up using a User Managed Identity with Storage Blob Data Owner. If you then go to another hns enabled storage account and add the role permission of "Storage Blob Data Contributor" to the same User Managed Identity, the cluster will be able to access the storage

hdfs dfs -ls abfs://<container>@<storageaccount>.dfs.core.windows.net