8
votes

I have several Windows VMs running on Azure that are configured to collect performance counters and event logs.

All of this is configured in the "Diagnostic settings..." on the VM resource inside Azure Portal. There's a Windows Azure Diagnostics agent that collects this data on the VM and stores it into a storage account (inside Table Storage).

All of this collected data (performance counters, metrics, logs, etc.) doesn't have any retention policy and there doesn't seem to be any way of setting it up. So it just accumulates in the storage account's table storage forever.

This is where my problem is -- there's now too much data in these tables (several terabytes in my case) and it's costing a lot of money just to keep it. And it's only going to keep increasing over time.

The relevant storage account tables are tables like:

  • WADMetrics* (Windows Azure Diagnostics Metrics Table)
  • WADPerformanceCountersTable (Windows Azure Diagnostics Performance Counters Table)
  • WASWindowsEventLogsTable (Windows Azure Diagnostics Windows Event Logs Table)

Is there some way how to delete old data in these tables so it wouldn't break anything? Or even better, is there some way to configure retention policy or set it up so that it doesn't keep accumulating forever?

3

3 Answers

3
votes

Is there some way how to delete old data in these tables so it wouldn't break anything?

You would need to do this manually. The way this would work is that you will first query the data that needs to be deleted and then once you get the data you will delete it. PartitionKey attribute of the entities stored in these tables actually represents a date/time value (in ticks prepended with zeroes to make it an equal length string) so you would need to take the from and to date/time values, convert them to ticks, make it a 19 character long string (by prepending appropriate number of zeroes) and query the data. Once you get the data on the client side, you will send delete request back to table storage.

To speed up the whole process, there are a few things you could do:

  • When you query the data, use query projection to return only PartitionKey and RowKey attributes as only these two attributes are needed for deletion.
  • For deletion, you could use entity batch transaction. This could speed up the deletion operation considerably.
  • For faster deletes, you can spin up a VM in the same region as that of your storage account. That way you are not paying for data egress charges.

I wrote a blog post some time ago that you may find helpful: https://gauravmantri.com/2012/02/17/effective-way-of-fetching-diagnostics-data-from-windows-azure-diagnostics-table-hint-use-partitionkey/.

Or even better, is there some way to configure retention policy or set it up so that it doesn't keep accumulating forever?

Unfortunately there isn't at least as of today. There's a retention setting but that's only for blobs.

1
votes

Just came across this issue as I was tracking down what costs the most in the subscription.

One useful tool is the Azure Storage Explorer. You can browse to a table, inspect its contents, use the Table Statistics button to count table rows, multi-select and delete rows.

For one small VM that's been running since 2016, I found that the WADMetrics tables seem to roll every 10 days, but the others do not. A sample WADMetrics table contained 5724 entries. The WASWindowsEventLogsTable contained 10,022 entries. I cancelled the WADPerformanceCountersTable count when it reached 5 million entries. It costs more to store the statistics than the VM's VHD.

This article summarizes useful information about PowerShell commands for manipulating tables. Unfortunately, the Azure Cloud Shell doesn't yet support commands for working inside a table e.g. Get-AzTableRow (see this report). I assume that would work if you set up the latest Az PowerShell commands locally. Then you could select with a filter and use Remove-AzTableRow to delete some of the rows. In my case, the machine has been decommissioned so I just needed a way to delete lots of tables without having to click on each one in the dashboard. Here are some sample commands to start from:

$location = "uswest"
$resourceGroup = "myRG"
$storageAccountName = "myData"
$storageAccount = get-AzStorageAccount -ResourceGroupName $resourceGroup -Name $storageAccountName
$ctx = $storageAccount.Context
# List all tables in storage account
Get-AzStorageTable -Context $ctx
# Count the WADMetrics tables
(Get-AzStorageTable -Context $ctx -Name "WADMetrics*").count
# Count the WADMetrics tables with "2018" in their name
(Get-AzStorageTable -Context $ctx -Name "WADMetrics*2018*").count
# Remove all WADMetrics tables with "2018" in their name without confirmation, then re-count 
# Only Get- supports wilcards, so pipe to Remove-AzStorageTable command
Get-AzStorageTable -Context $ctx -Name "WADMetrics*2018*" | Remove-AzStorageTable -Force
(Get-AzStorageTable -Context $ctx -Name "WADMetrics*2018*").count
# Remove the big tables. Confirmation takes a long time, so suppress it.
Remove-AzStorageTable -Context $ctx -Name "WADWindowsEventLogsTable" -Force
Remove-AzStorageTable -Context $ctx -Name "WADPerformanceCountersTable" -Force

# The following do NOT work in Azure Cloud Shell as of 07/16/2019.  See 
# https://github.com/MicrosoftDocs/azure-docs/issues/28608

# Count the rows in WADWindowsEventLogsTable
$tableName = "WADWindowsEventLogsTable"
$cloudTable = (Get-AzStorageTable -Context $ctx -Name $tableName).CloudTable
$cloudTableResults = Get-AzTableRow -table $cloudTable -columnName "RowKey"
$cloudTableResults.count
1
votes

Another solution is to write a small C# program to erase the Windows Azure Diagnostic (WAD) data.

The following article gives you a more or less out-the-box solution for methods that can erase both the WADMetrics* tables and rows contained in WADDiagnosticInfrastructureLogsTable, WADPerformanceCountersTable and WADWindowsEventLogsTable

Using the two methods DeleteOldTables() and DeleteOldData() it is easy to write a small program that can be executed monthly to cleanup the WAD data. Note that the code uses the WindowsAzure.Storage NuGet package, so that would need to installed in your project.

https://mysharepointlearnings.wordpress.com/2019/08/20/managing-azure-vm-diagnostics-data-in-table-storage/