0
votes

I'm using terraform v0.14.2 , and I'm trying to create a EKS cluster but I'm having problem when the nodes are joining to the cluster. The status stay stucked in "Creating" until get an error:

My code to deploy is:

Error: error waiting for EKS Node Group (EKS_SmartSteps:EKS_SmartSteps-worker-node-uk) creation: NodeCreationFailure: Instances failed to join the kubernetes cluster. Resource IDs: [i-00c4bac08b3c42225]

resource "aws_eks_node_group" "managed_workers" {
  for_each        = local.ob

  cluster_name    = aws_eks_cluster.cluster.name
  node_group_name = "${var.cluster_name}-worker-node-${each.value}"
  node_role_arn   = aws_iam_role.managed_workers.arn
  subnet_ids      = aws_subnet.private.*.id
  scaling_config {
    desired_size = 1
    max_size     = 1
    min_size     = 1
  }
  launch_template {
    id      = aws_launch_template.worker-node[each.value].id
    version = aws_launch_template.worker-node[each.value].latest_version
  }

  depends_on = [
    kubernetes_config_map.aws_auth_configmap,
    aws_iam_role_policy_attachment.eks-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.eks-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.eks-AmazonEC2ContainerRegistryReadOnly,
  ]
  lifecycle {
    create_before_destroy = true
    ignore_changes = [scaling_config[0].desired_size, scaling_config[0].min_size]
  }
}

resource "aws_launch_template" "worker-node" {
  for_each               = local.ob

  image_id               = data.aws_ssm_parameter.cluster.value
  name                   = "${var.cluster_name}-worker-node-${each.value}"
  instance_type          = "t3.medium"

  block_device_mappings {
    device_name = "/dev/xvda"

    ebs {
      volume_size = 20
      volume_type = "gp2"
    }
  }
  tag_specifications {
    resource_type = "instance"
    tags = {
      "Instance Name" = "${var.cluster_name}-node-${each.value}"
       Name = "${var.cluster_name}-node-${each.value}"
    }
  }
}

In fact, I see in the EC2 instances and EKS the nodes attached to the EKS cluster, but with this status error:

"Instances failed to join the kubernetes cluster"

I cant inspect where is the error because the error messages dont show more info..

Any idea?

thx

1
Your launch template doesn't appear to have a userdata script. You have to run the EKS bootstrap on the node on instance start, typically via userdata.jordanm
Are you using a custom AMI image?Jonas
@jordanm it's mandatory? I don't need any boostrap action, what should I do?Humberto Lantero
@Jonas I'm using this one, forget attach to code: /aws/service/eks/optimized-ami/1.19/amazon-linux-2/recommended/image_idHumberto Lantero
@HumbertoLantero it's mandatory that you do something to bootstrap the node, yes. see the docs here: aws.amazon.com/premiumsupport/knowledge-center/…jordanm

1 Answers

0
votes

So others can follow, you need to include a user data script to get the nodes to join the cluster. Something like:

userdata.tpl

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash
set -ex
/etc/eks/bootstrap.sh ${CLUSTER_NAME} --b64-cluster-ca ${B64_CLUSTER_CA} --apiserver-endpoint ${API_SERVER_URL}

--==MYBOUNDARY==--\

Where you render it like so

locals {
  user_data_values = {
    CLUSTER_NAME = var.cluster_name
    B64_CLUSTER_CA = var.cluster_certificate_authority
    API_SERVER_URL = var.cluster_endpoint
  }
}

resource "aws_launch_template" "cluster" {
  image_id  = "ami-XXX" # Make sure the AMI is an EKS worker
  user_data = base64encode(templatefile("userdata.tpl", local.user_data_values))
...
}

Aside from that, make sure the node group is part of the worker security group and has the required IAM roles and you should be fine.