Job deletion and recreation in Azure Batch raises BatchErrorException

Question

I'm writing a task manager for Azure Batch in Python. When I run the manager, and add a Job to the specified Azure Batch account, I do:

check if the specified job id already exists
if yes, delete the job
create the job

Unfortunately I fail between step 2 and 3. This is because, even if I issue the deletion command for the specified job and check that there is no job with the same id in the Azure Batch Account, I get a BatchErrorException like the following when I try to create the job again:

Exception encountered:
The specified job has been marked for deletion and is being garbage collected.

The code I use to delete the job is the following:

def deleteJob(self, jobId):

    print("Delete job [{}]".format(jobId))

    self.__batchClient.job.delete(jobId)

    # Wait until the job is deleted
    # 10 minutes timeout for the operation to succeed
    timeout = datetime.timedelta(minutes=10)
    timeout_expiration = datetime.datetime.now() + timeout 
    while True:

        try:
            # As long as we can retrieve data related to the job, it means it is still deleting
            self.__batchClient.job.get(jobId)
        except batchmodels.BatchErrorException:
            print("Job {jobId} deleted correctly.".format(
                jobId = jobId
                ))
            break

        time.sleep(2)

        if datetime.datetime.now() > timeout_expiration:
            raise RuntimeError("ERROR: couldn't delete job [{jobId}] within timeout period of {timeout}.".format(
                jobId = jobId
                , timeout = timeout
                ))

I tried to check the Azure SDK, but couldn't find a method that would tell me exactly when a job was completely deleted.

fpark fpark · Accepted Answer · 2017-08-24T14:38:49

Querying for existence of the job is the only way to determine if a job has been deleted from the system.

Alternatively, you can issue a delete job and then create a job with a different id, if you do not strictly need to reuse the same job id again. This will allow the job to delete asynchronously from your critical path.

Job deletion and recreation in Azure Batch raises BatchErrorException

2 Answers