1
votes

I have an Azure website on which a user can upload a lot of XML-files. These files need to be processed and filled in the database.

For this processing I use a continues webjob.

For non-relevant reasons all the uploaded files need to be process per user. So I have a table with all the files and the userId. And I have a table with running jobs. I have multiple webjobs doing the same process. Each webjob looks in the files table if any file needs to be processed. Before starting it checks against the running jobs table if another job is not already processing the files of the user.

This works fine and can run for months without any problem. But sometimes the continuous web jobs are restarting. Mostly at night (my time) making me miss valuable processing time. I'm the only one accessing Azure. I have not deployed anything new prior to the restart. The job is processing most of the time when it restarts. So a memory problem can be an issue. But I'm running a S3 and max cpu and memory don't exceed 40%. The logging isn't very helpful either:

[01/25/2018 05:03:20 > 5657e1: INFO] Starting job: 28158.
[01/25/2018 09:49:24 > 5657e1: SYS INFO] WebJob is still running
[01/25/2018 20:23:06 > 5657e1: SYS INFO] Status changed to Starting
[01/25/2018 20:23:06 > 5657e1: SYS INFO] WebJob singleton setting is False

Because the web job is not nicely finished the running job table isn't updated. On restart the job still thinks the files of the user are processed by another web job making all jobs waiting for each other and nothing is happening.

How can I see why the job is restarting? When I know the reason I might fix it. Any help is much appreciated.

Update I changed my entry point and added the following lines at the top of my main method:

    // Get the shutdown file path from the environment
    _shutdownFile = Environment.GetEnvironmentVariable("WEBJOBS_SHUTDOWN_FILE");
    _log.Info("Watching " + _shutdownFile);
    // Setup a file system watcher on that file's directory to know when the file is created:
    var filename = Path.GetFileName(_shutdownFile);
    if (filename != null)
    {
        var fileSystemWatcher = new FileSystemWatcher(filename);
        fileSystemWatcher.Created += OnAzureRestart;
        fileSystemWatcher.Changed += OnAzureRestart;
        fileSystemWatcher.NotifyFilter = NotifyFilters.CreationTime | NotifyFilters.FileName | NotifyFilters.LastWrite;
        fileSystemWatcher.IncludeSubdirectories = false;
        fileSystemWatcher.EnableRaisingEvents = true;
        _log.Info("FileSystemWatcher is set-up");
    }

But after publishing it to Azure the webjob won't start but throws and error:

[02/08/2018 15:23:56 > a93630: ERR ] Unhandled Exception: System.ArgumentException: The directory name gugfn3vx.0gk is invalid.
[02/08/2018 15:23:56 > a93630: ERR ]    at System.IO.FileSystemWatcher..ctor(String path, String filter)
[02/08/2018 15:23:56 > a93630: ERR ]    at System.IO.FileSystemWatcher..ctor(String path)
[02/08/2018 15:23:56 > a93630: ERR ]    at TaskRunner.Program.Main(String[] args)

I think the problem is at this line Path.GetFileName(_shutdownFile) because the file doesn't exist when the webjob is still running. Any more advice?

Update 2 Somehow I made a wrong code change. This is the working code:

    // Get the shutdown file path from the environment
    _shutdownFile = Environment.GetEnvironmentVariable("WEBJOBS_SHUTDOWN_FILE");
    _log.Info("Watching " + _shutdownFile);
    // Setup a file system watcher on that file's directory to know when the file is created:
    var folder = Path.GetDirectoryName(_shutdownFile);
    if (folder != null)
    {
        var fileSystemWatcher = new FileSystemWatcher(folder);
        fileSystemWatcher.Created += OnAzureRestart;
        fileSystemWatcher.Changed += OnAzureRestart;
        fileSystemWatcher.NotifyFilter = NotifyFilters.CreationTime | NotifyFilters.FileName | NotifyFilters.LastWrite;
        fileSystemWatcher.IncludeSubdirectories = false;
        fileSystemWatcher.EnableRaisingEvents = true;
        _log.Info("FileSystemWatcher is set-up");
    }

The change is in line var folder = Path.GetDirectoryName(_shutdownFile);

1
How often is this happening? Note that some restarts are expected in a PaaS environment as the platform gets upgraded. Also, you first say that you have one WebJob, and later say you have multiple. Can you clarify?David Ebbo
I have two instances of the same job running making it possible to process the files of two users. We're running the system several months now, almost a year and I've seen it 6 times. I don't always see when a restart has happened. I only notice it when files don't get processed. Then I look in the log and see it has restarted again.Paul Meems
Are you using the WebJobs SDK, or just implementing your WebJob with your own logic?David Ebbo
I'm not using the WebJobs SDKPaul Meems
Are you using the graceful shutdown pattern described here? Note that you should see at least one restart per month, and probably more due to platform upgrades. So 6 times doesn't sound right. Though it's possible that what you see here is not a clean restart but some kind of crash that takes you down without a warning.David Ebbo

1 Answers

2
votes

A couple key findings were outlined as we investigated in the comments:

  • for best shutdown behavior, your WebJob needs to implement the graceful shutdown pattern, which basically consists in listening for the appearance of a file named %WEBJOBS_SHUTDOWN_FILE% (note: this is not needed when using the WebJobs SDK as it does that automatically).
  • some restarts are expected in a PaaS environment as the platform gets upgraded. It's all about dealing with it without disruptions.