0
votes

I am struggling to get my Azure batch nodes to start within a Pool that is configured to use a virtual network. The virtual network has been configured with a service endpoint policy that has a "Microsoft.Storage" policy definition and it points at a single storage account. Without the service endpoints defined on the virtual network the Azure batch pool works as expected, but with it the following error occurs and the node never starts.

I have tried creating the Batch account in both Pool allocation modes. This did not seem to make a difference, the pool resizes successfully and then the nodes are stuck in "Starting" mode. In the "User Subscription" mode I found the start-up error because I can see the VM instance in my account:

VM has reported a failure when processing extension 'batchNodeExtension'. Error message: "Enable failed: processing file downloads failed: failed to download file[0]: failed to download file: unexpected status code: actual=403 expected=200" More information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot

From what I can determine this is an Azure VM extension that is running to configure the VM for Azure Batch. My base image is Canonical, ubuntuserver, 18.04-lts (batch.node.ubuntu 18.04). I can see that the extensions is attempting to download from:

https://a52a7f3c745c443e8c2cac69.blob.core.windows.net/nodeagentpackage-version9-22-0-2/Ubuntu-18.04/batch_init-ubuntu-18.04-1.8.7.tar.gz (note I removed the SAS token from this URL for posting here)

there are 8 further files that are downloaded and it looks like this is configuring the Batch agent on the node.

The 403 error indicates that the node cannot connect to this storage account, which makes sense given the service endpoint policy. It does not include this storage account within it and this storage account is external to my Azure subscription. I thought that I might be able to add it to the service endpoint policy, but I have no way of determining what Azure subscription it is part of it. If I knew this I thought I could add it like:

Endpoint policy allows you to add specific Azure Storage accounts to allow list, using the resourceID format. You can restrict access to all storage accounts in a subscription E.g. /subscriptions/subscriptionId (from https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoint-policies-overview)

I tried adding security group rules using service tags for Azure storage, but this did not help. The node still cannot connect and this makes sense given the description of service endpoint policies.

The reason for my interest in this is the following post: [https://github.com/Azure/Batch/issues/66][1]

I am trying to minimise the bandwidth charges from my storage account by using service endpoints.

I have also tried to create my own VM, but I am not sure whether the "batchNodeExtension" script is run automatically for VMs that you're using with Batch.

I would really appreciate any pointers because I am running out of ideas to try!

2

2 Answers

0
votes

Batch requires a generic rule for all of Storage (can be regional variant) as specified at https://docs.microsoft.com/en-us/azure/batch/batch-virtual-network#network-security-groups-specifying-subnet-level-rules. Currently it is mainly used to download our agent and maintain state/get information needed to run tasks.

0
votes

I am facing the same problem with Azure Machine Learning. We are trying to fight data exfiltration by using the SP Policies in order to prevent sending the data to any non-subscription storage accounts.

Since Azure ML Computes depends on the Batch service, we were unable to run any ML compute if the SP policy is associated to the compute subnet.

Microsoft stated the follwoing:

Filtering traffic on Azure services deployed into Virtual Networks: At this time, Azure Service Endpoint Policies are not supported for any managed Azure services that are deployed into your virtual network.

https://docs.microsoft.com/en-us/azure/virtual-network/virtual-network-service-endpoint-policies-overview#scenarios

I understand from this kind of restriction, that any service that use Azure Batch (which almost all services in Azure?) cannot use the SP Policy which make it useless freature...

Finally we endup by removing the SP policy completly from our network architecture and considered it only for scenarios where you to want to restrict customers to access specific storage accounts.