1
votes

I configured an AKS cluster to use a system-assigned managed identity to access to other Azure resources

resource "azurerm_subnet" "aks" {
  name = var.aks_subnet_name
  resource_group_name = azurerm_resource_group.main.name
  virtual_network_name = module.network.vnet_name
  address_prefix = var.aks_subnet
  service_endpoints = ["Microsoft.KeyVault"]
}

resource "azurerm_kubernetes_cluster" "aks_main" {
  name = module.aks_name.result
  depends_on = [azurerm_subnet.aks]
  location = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  dns_prefix = "aks-${local.name}"
  kubernetes_version = var.k8s_version
  addon_profile {
    oms_agent {
      # For monitoring containers
      enabled  = var.addons.oms_agent
      log_analytics_workspace_id = azurerm_log_analytics_workspace.example.id
    }
    kube_dashboard {
      enabled = true
    }
    azure_policy {
      # If we want to enfore policy definitions in the future
      # Check requirements https://docs.microsoft.com/en-ie/azure/governance/policy/concepts/policy-for-kubernetes
      enabled = var.addons.azure_policy
    }
  }
  default_node_pool {
    name = "default"
    orchestrator_version  = var.k8s_version
    node_count            = var.default_node_pool.node_count
    vm_size               = var.default_node_pool.vm_size
    type                  = "VirtualMachineScaleSets"
    availability_zones    = var.default_node_pool.zones
    # availability_zones  = ["1", "2", "3"]
    max_pods              = 250
    os_disk_size_gb       = 128
    vnet_subnet_id        = azurerm_subnet.aks.id
    node_labels           = var.default_node_pool.labels
    enable_auto_scaling   = var.default_node_pool.cluster_auto_scaling
    min_count             = var.default_node_pool.cluster_auto_scaling_min_count
    max_count             = var.default_node_pool.cluster_auto_scaling_max_count
    enable_node_public_ip = false
  }

  # Configuring AKS to use a system-assigned managed identity to access
  identity {
    type = "SystemAssigned"
  }

  network_profile {
    load_balancer_sku  = "standard"
    outbound_type      = "loadBalancer"
    network_plugin     = "azure"
    # if non-azure network policies
    # https://azure.microsoft.com/nl-nl/blog/integrating-azure-cni-and-calico-a-technical-deep-dive/
    network_policy     = "calico"
    dns_service_ip     = "10.0.0.10"
    docker_bridge_cidr = "172.17.0.1/16"
    service_cidr       = "10.0.0.0/16"
  }
  lifecycle {
    ignore_changes = [
      default_node_pool,
      windows_profile,
    ]
  }
}

I want to use that managed identity (the service principal created inside AKS cluster section code) to give it roles like this Network Contributor over a subnet:

resource "azurerm_role_assignment" "aks_subnet" {
  # Giving access to AKS SP identity created to akssubnet by assigning it
  # a Network Contributor role
  scope                = azurerm_subnet.aks.id
  role_definition_name = "Network Contributor"
  principal_id         = azurerm_kubernetes_cluster.aks_main.identity[0].principal_id
  # principal_id = azurerm_kubernetes_cluster.aks_main.kubelet_identity[0].object_id
  # principal_id = data.azurerm_user_assigned_identity.test.principal_id
  # skip_service_principal_aad_check = true
}

But the output I got after terraform apply is:

Error: authorization.RoleAssignmentsClient#Create: Failure responding 
to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. 
Status=403 Code="AuthorizationFailed" 
Message="The client 'afd5bd09-c294-4597-9c90-e1ee293e5f3a' with object id 
'afd5bd09-c294-4597-9c90-e1ee293e5f3a' does not have authorization 
to perform action 'Microsoft.Authorization/roleAssignments/write' 
over scope '/subscriptions/77dfff95-fbd3-4a15-b97a-b7182939e61a/resourceGroups/rhd-spec-prod-main-6loe4lpkr0hd8/providers/Microsoft.Network/virtualNetworks/rhd-spec-prod-main-wdaht6cn7s3s8/subnets/aks-subnet/providers/Microsoft.Authorization/roleAssignments/8733864c-a5f7-a6a9-a61d-6393989f0ad1' 
or the scope is invalid. If access was recently granted, please refresh your credentials."

  on aks.tf line 23, in resource "azurerm_role_assignment" "aks_subnet":
  23: resource "azurerm_role_assignment" "aks_subnet" {

It seems the service principal is being created does not have enough privileges to perform a role assignment over the subnet, or maybe I have wrong the scope attribute. I am passing there, the aks subnet id.

What am I doing wrong?

UPDATE

Checking the way Managed Identities has role assigneds, looks like we can only assign it roles related with Subscriptions, Resource Groups, Storage services, SQL services, and KeyVault.

enter image description here

enter image description here

Reading here

Before you can use the managed identity, it has to be configured. There are two steps:

Assign a role for the identity, associating it with the subscription that will be used to run Terraform. This step gives the identity permission to access Azure Resource Manager (ARM) resources.

Configure access control for one or more Azure resources. For example, if you use a key vault and a storage account, you will need to configure the vault and container separately.

Before you can create a resource with a managed identity and then assign an RBAC role, your account needs sufficient permissions. You need to be a member of the account Owner role, or have Contributor plus User Access Administrator roles.

Trying to proceed accordingly, I defined this section code:

resource "null_resource" "wait_for_resource_to_be_ready" {
  provisioner "local-exec" {
    command = "sleep 60"
  }
  depends_on = [
    azurerm_kubernetes_cluster.aks_main
  ]
}

data "azurerm_subscription" "current" {}

# FETCHING THE IDENTITY CREATED ON AKS CLUSTER
data "azurerm_user_assigned_identity" "test" {
  name                = "${azurerm_kubernetes_cluster.aks_main.name}-agentpool"
  resource_group_name = azurerm_kubernetes_cluster.aks_main.node_resource_group
}


data "azurerm_role_definition" "contributor" {
  name = "Network Contributor"
}

resource "azurerm_role_assignment" "aks_subnet" {

  # Giving access to AKS SP identity created to akssubnet by assigning it
  # a Network Contributor role
  # name                 = azurerm_kubernetes_cluster.aks_main.name
  # scope                =  var.aks_subnet_name # azurerm_subnet.aks.id  var.aks_subnet
  scope = data.azurerm_subscription.current.id
  #role_definition_name = "Network Contributor"
  role_definition_id = "${data.azurerm_subscription.current.id}${data.azurerm_role_definition.contributor.id}"
  # principal_id         = azurerm_kubernetes_cluster.aks_main.identity[0].principal_id
  # principal_id = azu rerm_kubernetes_cluster.aks_main.kubelet_identity[0].object_id
  principal_id = data.azurerm_user_assigned_identity.test.principal_id
  skip_service_principal_aad_check = true
  depends_on = [
    null_resource.wait_for_resource_to_be_ready
  ]
}

The terraform workflow try to create the role ...

> terraform_0.12.29 apply "prod_Infrastructure.plan"
null_resource.wait_for_resource_to_be_ready: Creating...
null_resource.wait_for_resource_to_be_ready: Provisioning with 'local-exec'...
null_resource.wait_for_resource_to_be_ready (local-exec): Executing: ["/bin/sh" "-c" "sleep 60"]
null_resource.wait_for_resource_to_be_ready: Still creating... [10s elapsed]
null_resource.wait_for_resource_to_be_ready: Still creating... [20s elapsed]
null_resource.wait_for_resource_to_be_ready: Still creating... [30s elapsed]
null_resource.wait_for_resource_to_be_ready: Still creating... [40s elapsed]
null_resource.wait_for_resource_to_be_ready: Still creating... [50s elapsed]
null_resource.wait_for_resource_to_be_ready: Still creating... [1m0s elapsed]
null_resource.wait_for_resource_to_be_ready: Creation complete after 1m0s [id=8505830187297683728]
azurerm_role_assignment.aks_subnet: Creating... 

but finally got the same AuthorizationFailed error this time over the subscription passed.

Error: authorization.RoleAssignmentsClient#Create: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client 'afd5bd09-c294-4597-9c90-e1ee293e5f3a' with object id 'afd5bd09-c294-4597-9c90-e1ee293e5f3a' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/77dfff95-fbd3-4a15-b97a-b7182939e61a' or the scope is invalid. If access was recently granted, please refresh your credentials."

  on aks.tf line 145, in resource "azurerm_role_assignment" "aks_subnet":
 145: resource "azurerm_role_assignment" "aks_subnet" {

Not sure at all how to verify this statement

Before you can create a resource with a managed identity and then assign an RBAC role, your account needs sufficient permissions. You need to be a member of the account Owner role, or have Contributor plus User Access Administrator roles.

By the way, I have the owner role in the subscription I am working with.

UPDATE 2

The object id referenced on both error messages above, belong to a service principal within my tenant. It is

az ad sp show --id afd5bd09-c294-4597-9c90-e1ee293e5f3a
{
  "accountEnabled": "True",
  "addIns": [],
  "alternativeNames": [],
  "appDisplayName": "Product-xxxx-ServicePrincipal-Production",
  "appId": "ff9c642c-06b9-47e2-9565-e3f6e782e14f",
  "appOwnerTenantId": "xxxxxxxx",
  "appRoleAssignmentRequired": false,
  "appRoles": [],
  "applicationTemplateId": null,
  "deletionTimestamp": null,
  "displayName": "Product-xxxx-ServicePrincipal-Production",
  "errorUrl": null,
  "homepage": null,
  "informationalUrls": {
    "marketing": null,
    "privacy": null,
    "support": null,
    "termsOfService": null
  },
  "keyCredentials": [],
  "logoutUrl": null,
  "notificationEmailAddresses": [],
  "oauth2Permissions": [],

  # THIS IS THE OBJECT ID
  "objectId": "afd5bd09-c294-4597-9c90-e1ee293e5f3a",
  
"objectType": "ServicePrincipal",
  "odata.metadata": "https://graph.windows.net/15f996bf-aad1-451c-8d17-9b95d025eafc/$metadata#directoryObjects/@Element",
  "odata.type": "Microsoft.DirectoryServices.ServicePrincipal",
  "passwordCredentials": [],
  "preferredSingleSignOnMode": null,
  "preferredTokenSigningKeyEndDateTime": null,
  "preferredTokenSigningKeyThumbprint": null,
  "publisherName": "xxxxxxx",
  "replyUrls": [],
  "samlMetadataUrl": null,
  "samlSingleSignOnSettings": null,
  "servicePrincipalNames": [
    "ff9c642c-06b9-47e2-9565-e3f6e782e14f"
  ],
  "servicePrincipalType": "Application",
  "signInAudience": "AzureADMyOrg",
  "tags": [
    "WindowsAzureActiveDirectoryIntegratedApp"
  ],
  "tokenEncryptionKeyId": null
}

Regarding permissions, not sure if it has sufficient, I would say yes, since it is used for multiple stuff in the subscription

enter image description here

What about Users Consent permissions? I don't have anything there

enter image description here

But on the other hand, why the process is trying to assign the role by using this service principal? I mean, the use of a managed identity, is intended to move away the use of service principals, but perhaps, the workflow procees use this SP just to assign the role to the managed identity and from that in forward the access will be granted by the managed identity (?)

1
A managed identity is a Service Principal at the end of the day. In this case the Service Principal (referred to as a Managed Identity) is managed by Microsoft Azure AD for you. The intent is that Azure manages the secret and the identity for developers so they don't have to worry about tokens, secrets, et al. docs.microsoft.com/en-us/azure/active-directory/…Steven K7FAQ

1 Answers

1
votes

From docs: https://docs.microsoft.com/en-us/azure/role-based-access-control/role-assignments-rest#add-a-role-assignment

To call this API, you must have access to the Microsoft.Authorization/roleAssignments/write operation. Of the built-in roles, only Owner and User Access Administrator are granted access to this operation.

So your service principal must have the role owner or user access administrator. Or you have to create a custom role with sufficient permissions.

Regarding the workflow, I agree. It is quiet counter intuitive.

old answer

There is this bug (?) where azure states that the resource has been created but not all services have access it.

You can have it wait for a minute with something like this:

resource "null_resource" "wait_for_resource_to_be_ready" {
  provisioner "local-exec" {
    command = "sleep 60"
  }

  depends_on = [
    azurerm_kubernetes_cluster.aks_main
  ]
}

Add a depends_on statment to your "azurerm_role_assignment" "aks_subnet" resource:

  depends_on = [
    null_resource.wait_for_resource_to_be_ready
  ]

Now first your cluster will be created, then terrform will wait for 60 seconds. Then your role_assignment will take place and will hopefully be able to grant the role.