2
votes

I have been working for nearly 5 straight days now on this and can't get this to work. According to AWS documentation I should* be able to mount an EFS Volume to a pod deployed to a fargate node in kubernetes (EKS).

I'm doing everything 100% through terraform. I'm lost at this point and my eyes are practically bleeding from the amount of terrible documentation I have read. Any guidance that anyone can give me on getting this to work would be amazing!

Here is what I have done so far:

  1. Setup an EKS CSI driver, storage class, and role bindings (not really sure why I need these role bindings tbh)
resource "kubernetes_csi_driver" "efs" {
  metadata {
    name = "efs.csi.aws.com"
  }

  spec {
    attach_required        = false
    volume_lifecycle_modes = [
      "Persistent"
    ]
  }
}

resource "kubernetes_storage_class" "efs" {
  metadata {
    name = "efs-sc"
  }
  storage_provisioner = kubernetes_csi_driver.efs.metadata[0].name
  reclaim_policy      = "Retain"
}

resource "kubernetes_cluster_role_binding" "efs_pre" {
  metadata {
    name = "efs_role_pre"
  }
  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind      = "ClusterRole"
    name      = "cluster-admin"
  }
  subject {
    kind      = "ServiceAccount"
    name      = "default"
    namespace = "pre"
  }
}

resource "kubernetes_cluster_role_binding" "efs_live" {
  metadata {
    name = "efs_role_live"
  }
  role_ref {
    api_group = "rbac.authorization.k8s.io"
    kind      = "ClusterRole"
    name      = "cluster-admin"
  }
  subject {
    kind      = "ServiceAccount"
    name      = "default"
    namespace = "live"
  }
}
  1. Setup The EFS Volume with policies and security groups
module "vpc" {
  source    = "../../read_only_data/vpc"
  stackname = var.vpc_stackname
}
resource "aws_efs_file_system" "efs_data" {
    creation_token = "xva-${var.environment}-pv-efsdata-${var.side}"

    # encrypted   = true
    # kms_key_id  = ""

    performance_mode = "generalPurpose" #maxIO
    throughput_mode  = "bursting"
    
    lifecycle_policy {
        transition_to_ia = "AFTER_30_DAYS"
    }
}

data "aws_efs_file_system" "efs_data" {
  file_system_id = aws_efs_file_system.efs_data.id
}

resource "aws_efs_access_point" "efs_data" {
  file_system_id = aws_efs_file_system.efs_data.id
}

/* Policy that does the following:
- Prevent root access by default
- Enforce read-only access by default
- Enforce in-transit encryption for all clients
*/
resource "aws_efs_file_system_policy" "efs_data" {
  file_system_id = aws_efs_file_system.efs_data.id

  policy = jsonencode({
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "*"
            },
            "Action": "elasticfilesystem:ClientMount",
            "Resource": aws_efs_file_system.efs_data.arn
        },
        {
            "Effect": "Deny",
            "Principal": {
                "AWS": "*"
            },
            "Action": "*",
            "Resource": aws_efs_file_system.efs_data.arn,
            "Condition": {
                "Bool": {
                    "aws:SecureTransport": "false"
                }
            }
        }
    ]
  })
}

# Security Groups for this volume
resource "aws_security_group" "allow_eks_cluster" {
  name        = "xva-${var.environment}-efsdata-${var.side}"
  description = "This will allow the cluster ${data.terraform_remote_state.cluster.outputs.eks_cluster_name} to access this volume and use it."
  vpc_id      = module.vpc.vpc_id

  ingress {
    description = "NFS For EKS Cluster ${data.terraform_remote_state.cluster.outputs.eks_cluster_name}"
    from_port   = 2049
    to_port     = 2049
    protocol    = "tcp"
    security_groups = [
      data.terraform_remote_state.cluster.outputs.eks_cluster_sg_id
    ]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "allow_tls"
  }
}

# Mount to the subnets that will be using this efs volume
# Also attach sg's to restrict access to this volume
resource "aws_efs_mount_target" "efs_data-app01" {
  file_system_id = aws_efs_file_system.efs_data.id
  subnet_id      = module.vpc.private_app_subnet_01

  security_groups = [
    aws_security_group.allow_eks_cluster.id
  ]
}

resource "aws_efs_mount_target" "efs_data-app02" {
  file_system_id = aws_efs_file_system.efs_data.id
  subnet_id      = module.vpc.private_app_subnet_02

  security_groups = [
    aws_security_group.allow_eks_cluster.id
  ]
}
  1. Create a Persistant Volume referencing the EFS Volume in kubernetes
data "terraform_remote_state" "csi" {
  backend = "s3"
  config = {
    bucket  = "xva-${var.account_type}-terraform-${var.region_code}"
    key     = "${var.environment}/efs/driver/terraform.tfstate"
    region  = var.region
    profile = var.profile
  }
}
resource "kubernetes_persistent_volume" "efs_data" {
  metadata {
    name = "pv-efsdata"

    labels = {
        app = "example"
    }
  }

  spec {
    access_modes = ["ReadOnlyMany"]

    capacity = {
      storage = "25Gi"
    }

    volume_mode                      = "Filesystem"
    persistent_volume_reclaim_policy = "Retain"
    storage_class_name               = data.terraform_remote_state.csi.outputs.storage_name

    persistent_volume_source {
      csi {
        driver        = data.terraform_remote_state.csi.outputs.csi_name
        volume_handle = aws_efs_file_system.efs_data.id
        read_only    = true
      }
    }
  }
}
  1. Then create a deployment to fargate with the pod mounting the EFS volume
data "terraform_remote_state" "efs_data_volume" {
  backend = "s3"
  config = {
    bucket  = "xva-${var.account_type}-terraform-${var.region_code}"
    key     = "${var.environment}/efs/volume/terraform.tfstate"
    region  = var.region
    profile = var.profile
  }
}
resource "kubernetes_persistent_volume_claim" "efs_data" {
  metadata {
    name      = "pv-efsdata-claim-${var.side}"
    namespace = var.side
  }

  spec {
    access_modes       = ["ReadOnlyMany"]
    storage_class_name =  data.terraform_remote_state.csi.outputs.storage_name
    resources {
      requests = {
        storage = "25Gi"
      }
    }
    volume_name = data.terraform_remote_state.efs_data_volume.outputs.volume_name
  }
}

resource "kubernetes_deployment" "example" {
  timeouts {
    create = "3m"
    update = "4m"
    delete = "2m"
  }

  metadata {
    name      = "deployment-example"
    namespace = var.side

    labels = {
      app      = "example"
      platform = "fargate"
      subnet   = "app"
    }
  }

  spec {
    replicas = 1

    selector {
      match_labels = {
        app = "example"
      }
    }

    template {
      metadata {
        labels = {
          app      = "example"
          platform = "fargate"
          subnet   = "app"
        }
      }

      spec {
        volume {
          name = "efs-data-volume"
          
          persistent_volume_claim {
            claim_name = kubernetes_persistent_volume_claim.efs_data.metadata[0].name
            read_only  = true
          }
        }

        container {
          image = "${var.nexus_docker_endpoint}/example:${var.docker_tag}"
          name  = "example"

          env {
            name  = "environment"
            value = var.environment
          }
          env {
            name = "dockertag"
            value = var.docker_tag
          }

          volume_mount {
            name = "efs-data-volume"
            read_only = true
            mount_path = "/appconf/"
          }

          # liveness_probe {
          #   http_get {
          #     path = "/health"
          #     port = 443
          #   }

          #   initial_delay_seconds = 3
          #   period_seconds        = 3
          # }

          port {
            container_port = 443
          }
        }
      }
    }
  }
}

It can see the persistant volume in kuberenetes, I can see that it is claimed, heck I can even see that it attempts to mount the volume in the pod logs. However, I inevitably always see the following error when describing the pod:

Volumes:
  efs-data-volume:
    Type:        PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:   pv-efsdata-claim-pre
    ReadOnly:    true
...
...
Events:
  Type     Reason       Age                  From                                                       Message
  ----     ------       ----                 ----                                                       -------
  Warning  FailedMount  11m (x629 over 23h)  kubelet, <redacted-fargate-endpoint>  Unable to attach or mount volumes: unmounted volumes=[efs-data-volume], unattached volumes=[efs-data-volume]: timed out waiting for the condition
  Warning  FailedMount  47s (x714 over 23h)  kubelet, <redacted-fargate-endpoint>  MountVolume.SetUp failed for volume "pv-efsdata" : kubernetes.io/csi: mounter.SetupAt failed: rpc error: code = InvalidArgument desc = Volume capability not supported
1
First correction is to change the ReadOnlyMany to something else. Apparently read only isn't supported with fargate and efs, now that's just silly. Still has issues mounting the volume though (using the above yamls) Most interesting warning is this: ``` Output: Could not start amazon-efs-mount-watchdog, unrecognized init system "supervisord" ```wesleywh

1 Answers

0
votes

I finally have done it. I have successfully mounted an EFS Volume to a Fargate Pod (nearly 6 days later)! I was able to get the direction I need from this closed github issue: https://github.com/aws/containers-roadmap/issues/826

It ended up being that I am using this module to build my eks cluster: https://registry.terraform.io/modules/cloudposse/eks-cluster/aws/0.29.0?tab=outputs

If you use the output "security_group_id" it outputs the "Additional Security group". Which in my experience is good for absolutely nothing in aws. Not sure why it even exists when you can't do anything with it. The security group I needed to use was the "Cluster security group". So I added the "Cluster security group"'s id on port 2049 ingress rule on the EFS volumes security groups mount point and BAM! I mounted the EFS volume to the deployed pod successfully.

The other important change was I changed the persistant volume type to be ReadWriteMany, since fargate apparently doesn't support ReadOnlyMany.