0
votes

I have a bunch of Azure Batch Tasks (Windows) which are dependent on each other so that they are executed one after another. There is only one Job. Each Task requires all of the files generated in the previous Task (plus whatever files the previous Task got from its previous Task).

How should I transfer these files between the Tasks? My current solution is to execute a move command at the start of each commandLine for each Task. This move command moves all files from the previous Task's folder to the current Task's folder. This kinda works but doesn't seem right and I don't have any backup about the intermediary results.

I was thinking of setting all of the files as output for each Task (to Blob Storage) and setting them as input for the next Task but this doesn't work because I would have to know all files in advance to generate ResourceFile references for them. I don't know in advance which files will get generated. So my next best idea is to generate a ResourceFile reference for a single ZIP file which contains whatever contents the previous Task zipped into it. So the ZIP file's contents would change all the time but I can add it as input because I can create a ResourceFile reference to it (even if the file's contents change). But this seems rather cumbersome.

Other ideas?

P.S. This is related to my earlier question at Azure batch task dependencies: copy files from previous which mentions this same problem but asks a different question.

2

2 Answers

2
votes

@lauri, I think there are few things you are do, considering you are already exploring the output file approach.

2 more ideas below:

Idea 1:

Using the azurefileshare for mounting in windows VMs.

Note: Since the VMs you got are Windows I think which limits your chance of using the blobfuse driver for blobstorage mount. ALthough if the azurefileshare is something you can use as the mounting point then you can use the feature in Batch called Mounting Virtual Filesystem and AzureFileSahre in particular for windows vms.

Here as well you need to make sure that there is the task Dependency in place so that for example task1 finishes first and then the output of that file can be accessed by the task 2 form mounted drive.

Idea 2:

Like you mention use persistening output file concept along with conceptually making that task dependent on the task which will first generate the outputfile and once its persisted the task which needs to use those files can trigger the download for resrouefile.

Pros for idea1:

Once mounted the drive is available in all windows node as drive so the download and upload time is cirtailed. (only the drive sync time latency is remain which should be few milliseconds) (I think un-noticeable)

Hope this helps. Thanks, :)

0
votes

What you likely want is to mount a virtual file system on the pool allowing shared access. See https://docs.microsoft.com/en-us/azure/batch/virtual-file-mount