2
votes

I have an Azure Data Lake Store (ADLS) containing ~100k files that I need to access from an HDInsight cluster for analysis. When I provision the cluster via Azure Portal, I use this ADLS for the cluster's storage and assign rwx privileges for all files on the ADLS using a service principal + the "Data Lake Store Access" feature. This feature appears to grant access to each file one at a time, at a rate of about 2k per minute: it takes over an hour just to grant the permissions!

Is there a faster way to grant a new cluster rwx privileges on its associated ADLS?

1

1 Answers

3
votes

Yes there is a better way to get this all set up. You need to, on a one-time basis, add permissions for an Azure Active Directory group to all your files and folders. Once that is set up, then whenever you create a new HDInsight cluster, the service principal simply needs to be made a member of the group.

So to summarize:

  1. Create a new Azure Active Directory Group
  2. Propagate permissions in your ADLS account to this group on the appropriate files and folders
  3. Create your HDInsight cluster. Choose the right service principal when creating it.
  4. Add the service principal to the group created in step 1

Hope this helps and do let me know if you have questions.