After combining ideas from multiple sources we came up with this solution:
In the desired Azure Release Pipeline in the desired stage add an Azure CLI task. This task can accept an inline PowerShell script or a path to a PowerShell script. Choose your own adventure. We chose to create a CheckWebJobStatus.ps1
with the included script (below) and exposed it as an artifact available to our Azure Release Pipeline.
What this PowerShell script does in short:
It checks the target WebJob's status up to 10 times (configurable via $totalRuns
) waiting 5 seconds between checks and waits for 3 consecutive Running
status reports.
param(
$resourceGroup,
$appServiceName,
$jobName,
$totalRuns = 10
)
Write-Host "Checking status of $jobName in $resourceGroup/$appServiceName"
$consecutiveRunningStatuses = 0
if ($totalRuns -lt 3) {
Write-Error "totalRuns must be 3 or greater"
exit 1
}
for ($i = 0; $i -lt $totalRuns; $i++) {
$jobs = (az webapp webjob continuous list --name $appServiceName --resource-group $resourceGroup | ConvertFrom-Json)
foreach ($job in $jobs) {
if ($job.name -eq "$appServiceName/$jobName") {
if ($job.status -eq "Running") {
Write-Host "$jobName is running! Attempt $i"
$consecutiveRunningStatuses++
if ($consecutiveRunningStatuses -eq 3) {
Write-Host "$jobName is running $consecutiveRunningStatuses times in a row! We assume that means it is stable."
exit 0
}
}
else {
Write-Host "$jobName status is $($job.status). Attempt $i"
$consecutiveRunningStatuses = 0
}
}
}
if ($i -ne ($totalRuns - 1)) {
Start-Sleep 5
}
}
Write-Host "$jobName failed to start after $totalRuns checks"
exit 1
Why 3 consecutive Running
status reports?
Because Azure WebJobs status reporting is not reliable. When a WebJob first deploys it enters the Starting
status then the Running
status. So far that seems good. However, if there is a fatal error on startup like a missing dependency, the job then enters the Pending Restart
status. In our observation Azure either automatically tries to start the WebJob again or the status gets weird and gets reported erroneously as being in the Running
status. The WebJob will then re-enter the Pending Restart
status and remain at that status until the next explicit attempt to deploy or start it. In our observations we did not see a failing WebJob remain in the Running
status for more then 2 consecutive reports 5 seconds apart or, in other words, within any 15 second window. Therefore in the script we are assuming, for now, that if we get 3 consecutive Running
status reports within 15 seconds the WebJob is assumed to be Running
.
Aside - How we did it:
We created a dedicated DeployTools repo with its own azure-pipelines.yaml
build configuration which only publishes the folder with that PowerShell file. Then in our desired Azure Release Pipeline we attached the artifacts from the DeployTools build.