0
votes

We execute a web request to our service from a job release task since azure batch (as far as I am aware) does not provide hooks for task completion to notify our service of a task completion. We do not want to poll for task completion.

We found that in case of autoscaling our pool down the job release task sometimes seems not to execute. This makes our callback unreliable.

The documentation states "When a job has completed, a job release task runs on each node in the pool that executed at least one task"

What is a reliable way to emit a callback when a task finishes? Is the job release task expected not to run if the pool autoscales - making it virtually useless in case crashdumps or logs need to be uploaded in a release task.

1

1 Answers

0
votes

A job's lifetime is not inherently bound to the lifetime of a compute node (or an instance of a Batch pool). A job's tasks can be spread across many compute nodes. As you have correctly stated via the documentation that a job's release task, if specified, is tied to a job completion (either via auto-complete, terminate, or deletion of the job).

Therefore, scale-in of a Batch pool has no bearing on if a job release task runs.

There are a few ways to approach this problem:

  1. You may want to consider adding custom code (perhaps a wrapper) to issue your callback in the task itself that's chained after the main program executed completes.
  2. Create a dependent task which executes your callback only after the task it depends on completes.
  3. Disable autoscale and manually scale out your pool by querying the number of tasks in your job and then issue job termination to run the job release tasks once all the tasks have been assigned. After the job transitions to completed state, manually scale in your pool.
  4. Utilize your operating system's hooks (like systemd) to run commands on shutdown.