4
votes

Enabling VM diagnostics in Azure is such a pain. I've gotten it working using ARM templates, the Azure PowerShell SDK, and the Azure CLI. But I've been trying for days now to enable VM diagnostics for both Windows and Linux VMs using Terraform and the azurerm_virtual_machine_extension resource. Still not working, ugh!

Here's what I have so far (I've tweaked this a bit to simplify it for this post, so hope I didn't break anything with my manual edits):

  resource "azurerm_virtual_machine_extension" "vm-linux" {
  count                      = "${local.is_windows_vm == "false" ? 1 : 0}"
  depends_on                 = ["azurerm_virtual_machine_data_disk_attachment.vm"]
  name                       = "LinuxDiagnostic"
  location                   = "${var.location}"
  resource_group_name        = "${var.resource_group_name}"
  virtual_machine_name       = "${local.vm_name}"
  publisher                  = "Microsoft.Azure.Diagnostics"
  type                       = "LinuxDiagnostic"
  type_handler_version       = "3.0"
  auto_upgrade_minor_version = "true"

  # The JSON file referenced below was created by running "az vm diagnostics get-default-config", and adding/verifying the "__DIAGNOSTIC_STORAGE_ACCOUNT__" and "__VM_RESOURCE_ID__" placeholders.
  settings = <<SETTINGS
    {
      "ladCfg": "${base64encode(replace(replace(file("${path.module}/.diag-settings/linux_diag_config.json"), "__DIAGNOSTIC_STORAGE_ACCOUNT__", "${module.vm_storage_account.name}"), "__VM_RESOURCE_ID__", "${local.metricsresourceid}"))}",
      "storageAccount": "${module.vm_storage_account.name}"
    }
SETTINGS

  # SAS token below: Do not include the leading question mark, as per https://docs.microsoft.com/en-us/azure/virtual-machines/extensions/diagnostics-linux.
  protected_settings = <<SETTINGS
    {
      "storageAccountName": "${module.vm_storage_account.name}",
      "storageAccountSasToken": "${replace(data.azurerm_storage_account_sas.current.sas, "/^\\?/", "")}",
      "storageAccountEndPoint": "https://core.windows.net/"
    }
SETTINGS
}

resource "azurerm_virtual_machine_extension" "vm-win" {
  count                      = "${local.is_windows_vm == "true" ? 1 : 0}"
  depends_on                 = ["azurerm_virtual_machine_data_disk_attachment.vm"]
  name                       = "Microsoft.Insights.VMDiagnosticsSettings"
  location                   = "${var.location}"
  resource_group_name        = "${var.resource_group_name}"
  virtual_machine_name       = "${local.vm_name}"
  publisher                  = "Microsoft.Azure.Diagnostics"
  type                       = "IaaSDiagnostics"
  type_handler_version       = "1.9"
  auto_upgrade_minor_version = "true"

  # The JSON file referenced below was created by running "az vm diagnostics get-default-config --is-windows-os", and adding/verifying the "__DIAGNOSTIC_STORAGE_ACCOUNT__" and "__VM_RESOURCE_ID__" placeholders.
  settings = <<SETTINGS
    {
      "wadCfg": "${base64encode(replace(replace(file("${path.module}/.diag-settings/windows_diag_config.json"), "__DIAGNOSTIC_STORAGE_ACCOUNT__", "${module.vm_storage_account.name}"), "__VM_RESOURCE_ID__", "${local.metricsresourceid}"))}",
      "storageAccount": "${module.vm_storage_account.name}"
    }
SETTINGS

  protected_settings = <<SETTINGS
    {
      "storageAccountName": "${module.vm_storage_account.name}",
      "storageAccountSasToken": "${data.azurerm_storage_account_sas.current.sas}",
      "storageAccountEndPoint": "https://core.windows.net/"
    }
SETTINGS
}

Notice that for both Linux and Windows I'm loading the diagnostics details from a JSON file within the code base, as per the comments. These are the default configs provided by Azure, so they should be valid.

When I deploy these, the Linux VM extension deploys successfully, but in the Azure portal the extension says "Problems detected in generated mdsd configuration". And if I look at the VM's "Diagnostic settings" it says "Error encountered: TypeError: Object doesn't support property or method 'diagnosticMonitorConfiguration'". The Windows VM extension fails to deploy altogether, saying that it "Failed to read configuration". If I view the extension in the portal it displays the following error:

"code": "ComponentStatus//failed/-3",
"level": "Error",
"displayStatus": "Provisioning failed",
"message": "Error starting the diagnostics extension"

And if I look at the "Diagnostics settings" pane it just hangs with a never-ending ". . ." animation.

However, if I look at the "terraform apply" output for both VM extensions, the decoded settings look exactly as intended, matching the config files with the placeholders correctly replaced.

Any suggestions on how to get this working?

Thanks in advance!

3
well, why do you think its (the error text) not correct. why are you sure your diagnostic configs are fine, also, this looks wrong: "https://core.windows.net/", should be "https://blob.core.windows.net/"4c74356b41
@4c74356b41, I can post the diagnostics configs if you'd like, but as per the comment in my code above, I got the configs using the Azure CLI's az vm diagnostics get-default-config. I'm open to suggestions for a better way to get a working config, but I'm inclined to consider that a very reliable source.Vince
@4c74356b41, regarding the http://core/windows.net/, that comes straight from multiple Microsoft docs online, such as this, and this, and this, and this, etc., etc.Vince
I am attempting to do this still in 2019. Did anyone come up with the start to finish deployment script? I've seen a few out there, but nothing complete.Eric Longstreet

3 Answers

2
votes

I've gotten the Windows Diagnostics to work 100% so far in our environment. It seems the AzureRM API is very picky about the config being sent. We had been using powershell to enable it, and the same xmlCfg used in powershell DID NOT WORK with terraform. So far this has worked for us: (The settings/protected_settings names are Case Sensitive! aka xmlCfg works, while xmlcfg does not)

main.cf

#########################################################
#  VM Extensions - Windows In-Guest Monitoring/Diagnostics
#########################################################
resource "azurerm_virtual_machine_extension" "InGuestDiagnostics" {
  name                       = var.compute["InGuestDiagnostics"]["name"]
  location                   = azurerm_resource_group.VMResourceGroup.location
  resource_group_name        = azurerm_resource_group.VMResourceGroup.name
  virtual_machine_name       = azurerm_virtual_machine.Compute.name
  publisher                  = var.compute["InGuestDiagnostics"]["publisher"]
  type                       = var.compute["InGuestDiagnostics"]["type"]
  type_handler_version       = var.compute["InGuestDiagnostics"]["type_handler_version"]
  auto_upgrade_minor_version = var.compute["InGuestDiagnostics"]["auto_upgrade_minor_version"]

  settings           = <<SETTINGS
    {
      "xmlCfg": "${base64encode(templatefile("${path.module}/templates/wadcfgxml.tmpl", { vmid = azurerm_virtual_machine.Compute.id }))}",
      "storageAccount": "${data.azurerm_storage_account.InGuestDiagStorageAccount.name}"
    }
SETTINGS
  protected_settings = <<PROTECTEDSETTINGS
    {
      "storageAccountName": "${data.azurerm_storage_account.InGuestDiagStorageAccount.name}",
      "storageAccountKey": "${data.azurerm_storage_account.InGuestDiagStorageAccount.primary_access_key}",
      "storageAccountEndPoint": "https://core.windows.net"
    }
PROTECTEDSETTINGS
}

tfvars

  InGuestDiagnostics = {
    name                       = "WindowsDiagnostics"
    publisher                  = "Microsoft.Azure.Diagnostics"
    type                       = "IaaSDiagnostics"
    type_handler_version       = "1.16"
    auto_upgrade_minor_version = "true"
  }

wadcfgxml.tmpl (I cut out some of the Perf counters for brevity)

<WadCfg>
    <DiagnosticMonitorConfiguration overallQuotaInMB="5120">
        <DiagnosticInfrastructureLogs scheduledTransferLogLevelFilter="Error"/>
        <Metrics resourceId="${vmid}">
            <MetricAggregation scheduledTransferPeriod="PT1H"/>
            <MetricAggregation scheduledTransferPeriod="PT1M"/>
        </Metrics>
        <PerformanceCounters scheduledTransferPeriod="PT1M">
            <PerformanceCounterConfiguration counterSpecifier="\Processor Information(_Total)\% Processor Time" sampleRate="PT60S" unit="Percent" />
            <PerformanceCounterConfiguration counterSpecifier="\Processor Information(_Total)\% Privileged Time" sampleRate="PT60S" unit="Percent" />
            <PerformanceCounterConfiguration counterSpecifier="\Processor Information(_Total)\% User Time" sampleRate="PT60S" unit="Percent" />
            <PerformanceCounterConfiguration counterSpecifier="\Processor Information(_Total)\Processor Frequency" sampleRate="PT60S" unit="Count" />
            <PerformanceCounterConfiguration counterSpecifier="\System\Processes" sampleRate="PT60S" unit="Count" />
            <PerformanceCounterConfiguration counterSpecifier="\SQLServer:SQL Statistics\SQL Re-Compilations/sec" sampleRate="PT60S" unit="Count" />
        </PerformanceCounters>

        <WindowsEventLog scheduledTransferPeriod="PT1M">
            <DataSource name="Application!*[System[(Level = 1 or Level = 2)]]"/>
            <DataSource name="Security!*[System[(Level = 1 or Level = 2)]"/>
            <DataSource name="System!*[System[(Level = 1 or Level = 2)]]"/>
        </WindowsEventLog>
    </DiagnosticMonitorConfiguration>
</WadCfg>

I finally got the Linux In-Guest Diagnostics to work (LAD). A few notable facts, unlike the windows diagnostics the settings need to be transmitted in json, no base64 encoding. Additionally LAD seems to require a SAS token with the storage account. The normal caveats around AzureRM API being picky about the config, and the settings being Case Sensitive still remain. Here is what is working for me so far..

# Locals
locals {
  env                  = var.workspace[terraform.workspace]
  # Use a set/static time to avoid TF from recreating the SAS token every apply, which would then cause it to
  # modify/recreate anything that uses it. Not ideal, but the token is for a VERY long time, so it will do for now
  sas_begintime = "2019-11-22T00:00:00Z"
  sas_endtime = timeadd(local.sas_begintime, "873600h")
}

#########################################################
#  VM Extensions - In-Guest Diagnostics
#########################################################
# We need a SAS token for the In-Guest Metrics
data "azurerm_storage_account_sas" "inguestdiagnostics" {
  count             = (contains(keys(local.env), "InGuestDiagnostics") ? 1 : 0)
  connection_string = data.azurerm_storage_account.BootDiagStorageAccount.primary_connection_string
  https_only        = true

  resource_types {
    service   = true
    container = true
    object    = true
  }

  services {
    blob  = true
    queue = true
    table = true
    file  = true
  }

  start  = local.sas_begintime
  expiry = local.sas_endtime

  permissions {
    read    = true
    write   = true
    delete  = true
    list    = true
    add     = true
    create  = true
    update  = true
    process = true
  }
}

resource "azurerm_virtual_machine_extension" "inguestdiagnostics" {
  for_each = contains(keys(local.env), "InGuestDiagnostics") ? local.env["InGuestDiagnostics"] : {}
  depends_on = [azurerm_virtual_machine_extension.dependencyagent]

  name                       = each.value["name"]
  location                   = azurerm_resource_group.resourcegroup.location
  resource_group_name        = azurerm_resource_group.resourcegroup.name
  virtual_machine_name       = azurerm_virtual_machine.compute["${each.key}"].name
  publisher                  = each.value["publisher"]
  type                       = each.value["type"]
  type_handler_version       = each.value["type_handler_version"]
  auto_upgrade_minor_version = each.value["auto_upgrade_minor_version"]

  settings           = templatefile("${path.module}/templates/ladcfg2json.tmpl", { vmid = azurerm_virtual_machine.compute["${each.key}"].id, storageAccountName = data.azurerm_storage_account.BootDiagStorageAccount.name })
  protected_settings = <<PROTECTEDSETTINGS
     {
       "storageAccountName": "${data.azurerm_storage_account.BootDiagStorageAccount.name}",
       "storageAccountSasToken": "${replace(data.azurerm_storage_account_sas.inguestdiagnostics.0.sas, "/^\\?/", "")}"
     }
 PROTECTEDSETTINGS
}
# These variations didn't work for me ..
# "ladCfg": "${templatefile("${path.module}/templates/ladcfgjson.tmpl", { vmid = azurerm_virtual_machine.compute["${each.key}"].id, storageAccountName = data.azurerm_storage_account.BootDiagStorageAccount.name })}",
# - This one get's you Error: "settings" contains an invalid JSON: invalid character '\n' in string literal or Error: "settings" contains an invalid JSON: invalid character 'S' after object key:value pair

# "ladCfg": "${replace(data.local_file.ladcfgjson["${each.key}"].content, "/\\n/", "")}",
# - This one get's you Error: "settings" contains an invalid JSON: invalid character 'S' after object key:value pair

tfvars

workspace = {
  TerraformWorkSpaceName = {
    compute = {
      # Add additional key/objects for additional Compute
      computer01 = {
        name       = "computer01"
      }
    }
    InGuestDiagnostics = {
      # Add additional key/objects for each Compute you want to install the InGuestDiagnostics on
      computer01 = {
        name                       = "LinuxDiagnostic"
        publisher                  = "Microsoft.Azure.Diagnostics"
        type                       = "LinuxDiagnostic"
        type_handler_version       = "3.0"
        auto_upgrade_minor_version = "true"
      }
    }
  }
}

I couldn't get a template file to work without wrapping the WHOLE thing in jsonencode. ladcfg2json.tmpl

${jsonencode({
  "StorageAccount": "${storageAccountName}",
  "ladCfg": {
    "sampleRateInSeconds": 15,
    "diagnosticMonitorConfiguration": {
        "metrics": {
            "metricAggregation": [
                {
                    "scheduledTransferPeriod": "PT1M"
                },
                {
                    "scheduledTransferPeriod": "PT1H"
                }
            ],
            "resourceId": "${vmid}"
        },
        "eventVolume": "Medium",
        "performanceCounters": {
            "sinks": "",
            "performanceCounterConfiguration": [
                {
                    "counterSpecifier": "/builtin/processor/percentiowaittime",
                    "condition": "IsAggregate=TRUE",
                    "sampleRate": "PT15S",
                    "annotation": [
                        {
                            "locale": "en-us",
                            "displayName": "CPU IO wait time"
                        }
                    ],
                    "unit": "Percent",
                    "class": "processor",
                    "counter": "percentiowaittime",
                    "type": "builtin"
                }
            ]
        },
        "syslogEvents": {
            "syslogEventConfiguration": {
                "LOG_LOCAL0": "LOG_DEBUG"
            }
        }
    }
  }
})}

I hope this helps..

1
votes

As the question was asked more than a year ago this is more for people like me who are trying this for the first time. We only use linux vms so this advice applies to that:

  1. protected settings should use PROTECTED_SETTINGS not SETTINGS (which you can see in @rv23 answer above)
  2. From the documentation I am following https://docs.microsoft.com/en-gb/azure/virtual-machines/extensions/diagnostics-linux#protected-settings you can see you need to specify storageAccountSasToken not storageAccountKey:

Here is my redacted version of config (replace all bits in all caps with your own settings ):

    resource "azurerm_virtual_machine_extension" "vm_linux_diagnostics" {
    count = "1"

    name = "NAME"

        resource_group_name = "YOUR RESOURCE GROUP NAME"
        location            = "YOUR LOCATION"

        virtual_machine_name = "TARGET MACHINE NAME"

        publisher                  = "Microsoft.Azure.Diagnostics"
        type                       = "LinuxDiagnostic"
        type_handler_version       = "3.0"
        auto_upgrade_minor_version = "true"

        settings = <<SETTINGS
        {
            "StorageAccount": "tfnpfsnhsuk",
            "ladCfg": {
                "sampleRateInSeconds": 15,
                "diagnosticMonitorConfiguration": {
                    "metrics": {
                        "metricAggregation": [
                            {
                                "scheduledTransferPeriod": "PT1M"
                            },
                            {
                                "scheduledTransferPeriod": "PT1H"
                            }
                        ],
                        "resourceId": "VM ID"
                    },
                    "eventVolume": "Medium",
                    "performanceCounters": {
                        "sinks": "",
                        .... MORE METRICS - THAT YOU REQUIRE
            }
            }
        }
        SETTINGS

        protected_settings = <<PROTECTED_SETTINGS
        {
            "storageAccountName": "YOUR_ACCOUNT_NAME",
            "storageAccountSasToken": "YOUR SAS TOKEN"
        }
        PROTECTED_SETTINGS

        tags = "YOUR TAG"
        }
0
votes

Just got this working on a similar question:

Trying to add LinuxDiagnostic Azure VM Extension through terraform and getting errors

This includes getting the SAS token and reading from json files.