3
votes

** Problem Background **

As we know, Azure WebJob SDK, has no way of defining a retention policy for logs. That means the execution or dashboard Blob storage can grow and impose problems including slowing down or crash the kudu Dashboard – which could compromise the stability of the other apps in the App Service plan.

The problem stated here:

https://github.com/Azure/azure-webjobs-sdk/issues/560

https://github.com/Azure/azure-webjobs-sdk/issues/1050

https://github.com/Azure/azure-webjobs-sdk/issues/107

My web job functions are extensively logging and they are running more than 100,000 times a day. That means I have a huge amount of log files piled up in my storage.

** The Workaround approach that I am planning: **

I am planning to add a time trigger Functions to my WebJob code that purges log entries older than 30 days.

We have the following blob containers created or used by the WebJobs SDK:

1.Storage Connection: AzureWebJobsDashboard

 1.1. azure-webjobs-dashboard
 1.2. azure-jobs-host-archive
 1.3. Duplicates with AzureWebJobsStorage
      1.3.1 azure-jobs-host-output
      1.3.2 azure-webjobs-host

2.Storage AzureWebJobsStorage

 2.1. azure-jobs-host-output
 2.2. azure-webjobs-host
    2.2.1 Heartbeats 
    2.2.2 Ids
    2.2.3 Output-logs

I am thinking to create a process that deletes every file older than 30 days from above containers. But I am concern that some of the blobs might be required by the running WebJobs.

** Question **

Which of the above blob containers do I need to purge, to prevent blob file pile-up problem without interfering running WebJobs ?

2

2 Answers

2
votes

As far as I know, AzureWebJobsDashboard connection string account is used to store logs from the WebJobs Dashboard. This connection string is optional.

It will generate two container 'azure-webjobs-dashboard'and 'azure-jobs-host-archive'.

Azure-webjobs-dashboard: WebJob dashboard to store host and execution endpoint (function) details

Azure-jobs-host-archive: This is used as an archive for execution logs.

So both of these containers could be deleted without interfering running WebJobs.

azure-jobs-host-output is the key for troubleshooting web jobs. This container hosts logs created by the WebJob runtime during initialization and termination of every execution. If you don't want this log , you could delete it.

Azure-webjobs-host container in-turn hosts three directories:

Heartbeats – Containing 0 byte blogs for every heartbeat check performed on the service. If you don't want it, you could delete the old file.

Ids – Containing the directory with a single blog holding a unique identifier for this service.I don't suggest you delete this container's file.

Output-logs – Hosts the output of the explicit logs for each run. Explicit logs being logs introduced by WebJob developers within the execution code. You could delete the old log.

1
votes

We've just implemented Storeage Lifecycle Management and are testing this:

{
    "version": "0.5",
    "rules": [
        {
            "name": "DeleteOldLogs",
            "type": "Lifecycle",
            "definition": {
                "actions": {
                    "baseBlob": {
                        "delete": {
                            "daysAfterModificationGreaterThan": 30
                        }
                    }
                },
                "filters": {
                    "blobTypes": [
                        "blockBlob"
                    ],
                    "prefixMatch": [
                        "azure-webjobs-host/output-logs",
                        "azure-webjobs-dashboard/functions/recent",
                        "azure-webjobs-dashboard/functions/instances",
                        "azure-jobs-host-output",
                        "azure-jobs-host-archive"
                    ]
                }
            }
        }
    ]
}