1
votes

I am currently using Python to read a file from Azure blob storage and store it in a dataframe. In order to authenticate the blob storage I am extracting storage account key from the Azure key vault using the service principal credentials.

My code is as follows:

from azure.keyvault import KeyVaultClient, KeyVaultAuthentication
from azure.common.credentials import ServicePrincipalCredentials

def auth_callback(server, resource, scope):
    credentials = ServicePrincipalCredentials(
        client_id = '',
        client_secret='',
        tenant = '',
        resource = "https://samplename.vault.azure.net/"
    )
    token = credentials.token
    return token['token_type'], token['access_token']

client = KeyVaultClient(KeyVaultAuthentication(auth_callback))

key_bundle = client.get_key('https://samplename.vault.azure.net/', '','')
json_key = key_bundle.key 

However, I have to save the service principal key inside the code, which I feel is not the best practice.

How can I avoid this?

I have also thought of storing the service principal credentials in a separate config file stored in blob storage and then reading it from Python. But that also involves ultimately storing the credentials of tee service principal in a text file.

I am running Python from Azure Batch.

3

3 Answers

1
votes

I'm also trying to run Python in the batch service, and would like to access Key Vault without any credentials being visible on the portal or in the python script, so I need to use a certificate rather than a key. I think this has got to be a pretty common use case for the batch service.

I can access Key Vault using the certificate thumbprint in a Powershell script, but the only way I could find to access it from python was by setting the AZURE_CLIENT_CERTIFICATE_PATH environment variable to point to a non password protected pem certificate and using the DefaultAzureCredential class. The batch service requires password protected cer or pfx certificates, so any solution is going to be very fiddly - the only thing I can think of is to convert the pfx to a pem and then set the environment variable, with the pfx password being visible in the code or on the portal.

If there's a simpler way please let me know. Otherwise I think this is a significant gap in the batch service, which is otherwise very useful.

0
votes

The best answer will depend where you execute your code.

If you execute it on an Azure VM, Azure Container or anything inside Azure, then your scenario is exactly why MSI (Managed System Identities) exists :). I would strongly suggests you take a look at MSI documentation: https://docs.microsoft.com/azure/active-directory/managed-identities-azure-resources/overview

This is fully supported in the Python SDK.

If you execute in your own environment outside, then the question is not really Azure specific, and you could use a library like "keyring" to take care of storing this kind of secret:

https://pypi.org/project/keyring/#what-is-python-keyring-lib

(disclosure: I work at MS in the Python SDK team)

0
votes

Current best practices for using Azure Batch is to use certificate based authentication for your service principal. To do this add your certificate to Batch using the 'Certificates API' (https://docs.microsoft.com/en-us/python/api/azure-batch/azure.batch.operations.certificate_operations.certificateoperations?view=azure-python#add-certificate--certificate-add-options-none--custom-headers-none--raw-false----operation-config-). Then when you create your pool you can specify 'certificate_references' to have these certificates installed to each node in your pool (https://docs.microsoft.com/en-us/python/api/azure-batch/azure.batch.models.pooladdparameter?view=azure-python).

If you prefer to use key-based authentication you can additionally specify the keys as environment variables on the pool, which will be encrypted at REST

We receive quite a few requests to add MSI support, but currently I do not know the timeline for when it is planned to be added.

(disclosure: I work at MS in the Azure Batch team)