Scenario: I have multiple tasks running DL
models on the same dataset
. It is becoming wasteful downloading the same dataset
in each task, so looking for methods that allows to persist the downloaded data across different task runs that require the same dataset
.
I explored ResourceFiles and ApplicationPackages, however as per my understanding they do not suite my requirement because of the following
- ResourceFiles download the data for every task run and is not persisted.
- ApplicationPackages have a quota limit (default 20). And they cannot be created from within the docker container.
As per docker volume capabilities, I can run my tasks with the same volume ID and the data downloaded will persist in the VM. Since Azure batch does not directly expose the "docker run" command for running the container, is there any other way to specify using volumes for the batch tasks using the python SDK?
Can we use the "container_run_options" of TaskContainerSettings to mention docker volumes?
Edit
I tried specifying volume in TaskContainerSettings, but while trying to write to the mounted path, I am getting permission denied error
PermissionError: [Errno 13] Permission denied: '/opt/docker/Gy9EKVB728YcVZgn7e2AVuuQ/00000001.jpg'