5
votes

I am setting up a Kubernetes deployment using auto-scaling groups and Terraform. The kube master node is behind an ELB to get some reliability in case of something going wrong. The ELB has the health check set to tcp 6443, and tcp listeners for 8080, 6443, and 9898. All of the instances and the load balancer belong to a security group that allows all traffic between members of the group, plus public traffic from the NAT Gateway address. I created my AMI using the following script (from the getting started guide)...

# curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
# cat <<EOF > /etc/apt/sources.list.d/kubernetes.list
deb http://apt.kubernetes.io/ kubernetes-xenial main
EOF
# apt-get update
# # Install docker if you don't have it already.
# apt-get install -y docker.io
# apt-get install -y kubelet kubeadm kubectl kubernetes-cni

I use the following user data scripts...

kube master

#!/bin/bash
rm -rf /etc/kubernetes/*
rm -rf /var/lib/kubelet/*

kubeadm init \
  --external-etcd-endpoints=http://${etcd_elb}:2379 \
  --token=${token} \
  --use-kubernetes-version=${k8s_version} \
  --api-external-dns-names=kmaster.${master_elb_dns} \
  --cloud-provider=aws
until kubectl cluster-info
do
  sleep 1
done
kubectl apply -f https://git.io/weave-kube

kube node

#!/bin/bash
rm -rf /etc/kubernetes/*
rm -rf /var/lib/kubelet/*

until kubeadm join --token=${token} kmaster.${master_elb_dns}
do
  sleep 1
done

Everything seems to work properly. The master comes up and responds to kubectl commands, with pods for discovery, dns, weave, controller-manager, api-server, and scheduler. kubeadm has the following output on the node...

Running pre-flight checks
<util/tokens> validating provided token
<node/discovery> created cluster info discovery client, requesting info from "http://kmaster.jenkins.learnvest.net:9898/cluster-info/v1/?token-id=eb31c0"
node/discovery> failed to request cluster info, will try again: [Get http://kmaster.jenkins.learnvest.net:9898/cluster-info/v1/?token-id=eb31c0: EOF]
<node/discovery> cluster info object received, verifying signature using given token
<node/discovery> cluster info signature and contents are valid, will use API endpoints [https://10.253.129.106:6443]
<node/bootstrap> trying to connect to endpoint https://10.253.129.106:6443
<node/bootstrap> detected server version v1.4.4
<node/bootstrap> successfully established connection with endpoint https://10.253.129.106:6443
<node/csr> created API client to obtain unique certificate for this node, generating keys and certificate signing request
<node/csr> received signed certificate from the API server:
Issuer: CN=kubernetes | Subject: CN=system:node:ip-10-253-130-44 | CA: false
Not before: 2016-10-27 18:46:00 +0000 UTC Not After: 2017-10-27 18:46:00 +0000 UTC
<node/csr> generating kubelet configuration
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"

Node join complete:
* Certificate signing request sent to master and response
  received.
* Kubelet informed of new secure connection details.

Run 'kubectl get nodes' on the master to see this machine join.

Unfortunately, running kubectl get nodes on the master only returns itself as a node. The only interesting thing I see in /var/log/syslog is

Oct 27 21:19:28 ip-10-252-39-25 kubelet[19972]: E1027 21:19:28.198736   19972 eviction_manager.go:162] eviction manager: unexpected err: failed GetNode: node 'ip-10-253-130-44' not found
Oct 27 21:19:31 ip-10-252-39-25 kubelet[19972]: E1027 21:19:31.778521   19972 kubelet_node_status.go:301] Error updating node status, will retry: error getting node "ip-10-253-130-44": nodes "ip-10-253-130-44" not found

I am really not sure where to look...

2
did you ever figure this out? i have the same issue.dhempler
I should have come back to this when it happened! My memory is a bit foggy, but if I remember correctly this was caused because the AWS cloud provider was setting the nodename to the aws instance id instead of a resolvable dns name.Paul Becotte

2 Answers

6
votes

The Hostnames of the two machines (master and the node) should be different. You can check them by running cat /etc/hostname. If they do happen to be the same, edit that file to make them different and then do a sudo reboot to apply the changes. Otherwise kubeadm will not be able to differentiate between the two machines and it will show as a single one in kubectl get nodes.

0
votes

Yes , I faced the same problem.

I resolved by:

killall kubelet

run the kubectl join command again

and start the kubelet service