2
votes

I'm using 0.14.2 Terraform version. I'm trying to deploy an EKS cluster with two nodes with this code:

resource "aws_eks_cluster" "cluster" {
  enabled_cluster_log_types = []
  name                      = var.cluster_name
  role_arn                  = aws_iam_role.cluster.arn
  version                   = var.eks_version
  vpc_config {
    subnet_ids              = flatten([ aws_subnet.private.*.id, aws_subnet.public.*.id ])
    security_group_ids      = []
    endpoint_private_access = "true"
    endpoint_public_access  = "true"
  }
  tags = var.tags[terraform.workspace]

  depends_on = [
    aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,
    aws_iam_role_policy_attachment.cluster_AmazonEKSServicePolicy,
    aws_cloudwatch_log_group.cluster
  ]
}

resource "aws_launch_configuration" "eks-managenodes" {
  for_each                    = local.ob
  
  name_prefix                 = "${var.cluster_name}-launch-${each.value}"
  image_id                    = "ami-038341f2c72928ada"
  instance_type               = "t3.medium"
  user_data = <<-EOF
      #!/bin/bash
      set -o xtrace
      /etc/eks/bootstrap.sh ${var.cluster_name}
      EOF

  root_block_device {
    delete_on_termination = true
    volume_size = 30
    volume_type = "gp2"
  }

  lifecycle {
    create_before_destroy = true
  }
}

resource "aws_autoscaling_group" "eks-asg" {
  for_each        = local.ob

  desired_capacity     = 1
  launch_configuration = aws_launch_configuration.eks-managenodes[each.value].id
  max_size             = 1
  min_size             = 1
  name                 = "${var.cluster_name}-node-${each.value}"
  vpc_zone_identifier  = aws_subnet.private.*.id

  tag {
    key                 = "Name"
    value               = "eks-manage-node-${each.value}"
    propagate_at_launch = true
  }

  tag {
    key                 = "kubernetes.io/cluster/${var.cluster_name}"
    value               = "owned"
    propagate_at_launch = true
  }
  depends_on = [
    aws_launch_configuration.eks-managenodes,
    aws_eks_cluster.cluster
  ]
}

Then, the cluster deploy fine, the ASG and the EC2 instances deploy fine, but the problem is that these instances doesn't attach to the corresponding cluster and I don't find the problem..

Any idea? Thanks

2

2 Answers

2
votes

Nodes can fail to join a cluster for a variety of reasons.

  1. a failure during cloud-init may be preventing them from registering with the cluster control plane.
  2. there may be IAM authentication failures.

Debugging steps:

  1. Ssh into a node and check /var/log/cloud-init.log and /var/log/cloud-init-output.log to ensure that it completed without error.

  2. verify that kubelet and aws-node processes are running on the ec2 nodes. Both should show up in ps

  3. check that /etc/eks/bootstrap.sh exists. Try invoking it as root with the arguments /etc/eks/bootstrap.sh --apiserver-endpoint '${endpoint}' --b64-cluster-ca '${cluster_ca_data}' '${cluster_name}' using the variables sourced from the EKS overview page in the AWS ui.

  4. check the aws-auth config map in kube-system and verify the ec2 role is mapped like this:

    mapRoles: |
      - rolearn: arn:aws:iam::<account id>:role/<node role>
        username: system:node:{{EC2PrivateDNSName}}
        groups:
          - system:bootstrappers
          - system:nodes

as without this the node will not be able to authenticate to the cluster.

When in doubt, try the newest version of the EKS ami for your cluster's kubernetes version - some AMIs are broken.

0
votes

Foe the EKS cluster to own those nodes you will have to use the AWS EKS Node Group and not EC2 Launch Configuration I believe.

From the same link in documentation below, the cluster_name will be referencing the EKD cluster you created.

resource "aws_eks_node_group" "example" {
  cluster_name    = aws_eks_cluster.example.name
  node_group_name = "example"
  node_role_arn   = aws_iam_role.example.arn
  subnet_ids      = aws_subnet.example[*].id

  scaling_config {
    desired_size = 1
    max_size     = 1
    min_size     = 1
  }