I am running vanilla EKS Kubernetes at version 1.12.
I've used CNI Genie to allow custom selection of the CNI that pods use when starting and I've installed the standard Calico CNI setup.
With CNI Genie I configured the default CNI to be the AWS CNI (aws-node) and all pods start up as usual and get assigned an IP from my VPC subnets.
I then selectively use calico as the CNI for some basic pods I am testing with. I'm using the default calico 192.168.0.0/16 CIDR range. Everything works great if the pods are on the same EKS worker nodes.
Core DNS is working great too (as long as I keep the coredns pods running on the aws CNI).
However, if a pod moves to a different worker node, then networking between them does not work inside the cluster.
I've checked the routing tables on the worker nodes that calico auto configures and it appears logical to me.
Here is my wide pod listing across all namespaces:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
default hello-node1-865588ccd7-64p5x 1/1 Running 0 31m 192.168.106.129 ip-10-0-2-31.eu-west-2.compute.internal <none>
default hello-node2-dc7bbcb74-gqpwq 1/1 Running 0 17m 192.168.25.193 ip-10-0-3-222.eu-west-2.compute.internal <none>
kube-system aws-node-cm2dp 1/1 Running 0 26m 10.0.3.222 ip-10-0-3-222.eu-west-2.compute.internal <none>
kube-system aws-node-vvvww 1/1 Running 0 31m 10.0.2.31 ip-10-0-2-31.eu-west-2.compute.internal <none>
kube-system calico-kube-controllers-56bfccb786-fc2j4 1/1 Running 0 30m 10.0.2.41 ip-10-0-2-31.eu-west-2.compute.internal <none>
kube-system calico-node-flmnl 1/1 Running 0 31m 10.0.2.31 ip-10-0-2-31.eu-west-2.compute.internal <none>
kube-system calico-node-hcmqd 1/1 Running 0 26m 10.0.3.222 ip-10-0-3-222.eu-west-2.compute.internal <none>
kube-system coredns-6c64c9f456-g2h9k 1/1 Running 0 30m 10.0.2.204 ip-10-0-2-31.eu-west-2.compute.internal <none>
kube-system coredns-6c64c9f456-g5lhl 1/1 Running 0 30m 10.0.2.200 ip-10-0-2-31.eu-west-2.compute.internal <none>
kube-system genie-plugin-hspts 1/1 Running 0 26m 10.0.3.222 ip-10-0-3-222.eu-west-2.compute.internal <none>
kube-system genie-plugin-vqd2d 1/1 Running 0 31m 10.0.2.31 ip-10-0-2-31.eu-west-2.compute.internal <none>
kube-system kube-proxy-jm7f7 1/1 Running 0 26m 10.0.3.222 ip-10-0-3-222.eu-west-2.compute.internal <none>
kube-system kube-proxy-nnp76 1/1 Running 0 31m 10.0.2.31 ip-10-0-2-31.eu-west-2.compute.internal <none>
As you can see, the two hello-node pods are using the Calico CNI.
I've exposed the hello-node pods with two services:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-node1 ClusterIP 172.20.90.83 <none> 8081/TCP 43m
hello-node2 ClusterIP 172.20.242.22 <none> 8082/TCP 43m
I've confirmed if I start the hello-node pods with the aws CNI that I can ping / curl between them when they run on separate hosts using the cluster service names.
Things stop working when I use Calico CNI as above.
I only have two EKS worker hosts in this test cluster. Here is the routing for each:
K8s Worker 1 routes
[ec2-user@ip-10-0-3-222 ~]$ ip route
default via 10.0.3.1 dev eth0
10.0.3.0/24 dev eth0 proto kernel scope link src 10.0.3.222
169.254.169.254 dev eth0
blackhole 192.168.25.192/26 proto bird
192.168.25.193 dev calia0da7d91dc2 scope link
192.168.106.128/26 via 10.0.2.31 dev tunl0 proto bird onlink
K8s Worker 2 routes
[ec2-user@ip-10-0-2-31 ~]$ ip route
default via 10.0.2.1 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.31
10.0.2.41 dev enif4cf9019f11 scope link
10.0.2.200 dev eni412af1a0e55 scope link
10.0.2.204 dev eni04260ebbbe1 scope link
169.254.169.254 dev eth0
192.168.25.192/26 via 10.0.3.222 dev tunl0 proto bird onlink
blackhole 192.168.106.128/26 proto bird
192.168.106.129 dev cali19da7817849 scope link
To me, the route:
192.168.25.192/26 via 10.0.3.222 dev tunl0 proto bird onlink
tells me that traffic destined for the 192.168.25.192/16 subnet from this worker (and its containers/pods) should go out to the 10.0.3.222 (AWS VPC ENI for the EC2 host) on the tunl0 interface.
This route is on the EC2 host 10.0.2.31
. So in other words when talking from this host's containers to containers on the calico subnet 192.168.25.192/16, network traffic should route to 10.0.3.222 (the ENI IP for my other EKS worker node where containers using Calico run on that subnet).
To clarify my testing procedure:
- Exec into hello-node1 pod, and
curl http://hello-node2:8082
(or ping the calico assigned IP address of the hello-node2 pod.
EDIT
To further test this, I've run tcpdump on the host where the hello-node2 pod is running, capturing on port 8080 (the container listens on this port).
I do get activity on the destination host where the test container that I am curling to is running, but it doesn't seem to indicate dropped traffic.
[ec2-user@ip-10-0-3-222 ~]$ sudo tcpdump -vv -x -X -i tunl0 'port 8080'
tcpdump: listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes
14:32:42.859238 IP (tos 0x0, ttl 254, id 63813, offset 0, flags [DF], proto TCP (6), length 60)
10.0.2.31.29192 > 192.168.25.193.webcache: Flags [S], cksum 0xf932 (correct), seq 3206263598, win 28000, options [mss 1400,sackOK,TS val 2836614698 ecr 0,nop,wscale 7], length 0
0x0000: 4500 003c f945 4000 fe06 9ced 0a00 021f E..<.E@.........
0x0010: c0a8 19c1 7208 1f90 bf1b b32e 0000 0000 ....r...........
0x0020: a002 6d60 f932 0000 0204 0578 0402 080a ..m`.2.....x....
0x0030: a913 4e2a 0000 0000 0103 0307 ..N*........
14:32:43.870168 IP (tos 0x0, ttl 254, id 63814, offset 0, flags [DF], proto TCP (6), length 60)
10.0.2.31.29192 > 192.168.25.193.webcache: Flags [S], cksum 0xf53f (correct), seq 3206263598, win 28000, options [mss 1400,sackOK,TS val 2836615709 ecr 0,nop,wscale 7], length 0
0x0000: 4500 003c f946 4000 fe06 9cec 0a00 021f E..<.F@.........
0x0010: c0a8 19c1 7208 1f90 bf1b b32e 0000 0000 ....r...........
0x0020: a002 6d60 f53f 0000 0204 0578 0402 080a ..m`.?.....x....
0x0030: a913 521d 0000 0000 0103 0307 ..R.........
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel
Even the calia0da7d91dc2 interface on the host running my target/test pod shows increased RX packets and byte counts whenever I run the curl from the other pod on the other host. Traffic is definitely traversing.
[ec2-user@ip-10-0-3-222 ~]$ ifconfig
calia0da7d91dc2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1440
inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 84 bytes 5088 (4.9 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
What is preventing the networking from working between hosts here? Am I missing something obvious?
Edit 2 - information for Arjun Pandey- parjun8840
Here is some more info about my Calico configuration:
- I am have disabled source/destination checking on all AWS EC2 worker nodes
- I've followed the latest calico docs to configure the IP pool for cross-subnet and NAT use for traffic outside the cluster
calicoctl configs Note: it seems that the workloadendpoints are non-existent...
me@mine ~ aws-vault exec my-vault-entry -- kubectl get IPPool --all-namespaces
NAME AGE
default-ipv4-ippool 1d
me@mine ~ aws-vault exec my-vault-entry -- kubectl get IPPool default-ipv4-ippool -o yaml
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
annotations:
projectcalico.org/metadata: '{"uid":"41bd2c82-d576-11e9-b1ef-121f3d7b4d4e","creationTimestamp":"2019-09-12T15:59:09Z"}'
creationTimestamp: "2019-09-12T15:59:09Z"
generation: 1
name: default-ipv4-ippool
resourceVersion: "500448"
selfLink: /apis/crd.projectcalico.org/v1/ippools/default-ipv4-ippool
uid: 41bd2c82-d576-11e9-b1ef-121f3d7b4d4e
spec:
blockSize: 26
cidr: 192.168.0.0/16
ipipMode: CrossSubnet
natOutgoing: true
nodeSelector: all()
vxlanMode: Never
me@mine ~ aws-vault exec my-vault-entry -- calicoctl get nodes
NAME
ip-10-254-109-184.ec2.internal
ip-10-254-109-237.ec2.internal
ip-10-254-111-147.ec2.internal
me@mine ~ aws-vault exec my-vault-entry -- calicoctl get workloadendpoints
WORKLOAD NODE NETWORKS INTERFACE
me@mine ~
Here is some network info for a sample host in the cluster and one of the test container's container network:
host ip a
[ec2-user@ip-10-254-109-184 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 02:1b:79:d1:c5:bc brd ff:ff:ff:ff:ff:ff
inet 10.254.109.184/26 brd 10.254.109.191 scope global dynamic eth0
valid_lft 2881sec preferred_lft 2881sec
inet6 fe80::1b:79ff:fed1:c5bc/64 scope link
valid_lft forever preferred_lft forever
3: eni808caba7453@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether c2:be:80:d4:6a:f3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::c0be:80ff:fed4:6af3/64 scope link
valid_lft forever preferred_lft forever
5: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 192.168.29.128/32 brd 192.168.29.128 scope global tunl0
valid_lft forever preferred_lft forever
6: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 02:12:58:bb:c6:1a brd ff:ff:ff:ff:ff:ff
inet 10.254.109.137/26 brd 10.254.109.191 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::12:58ff:febb:c61a/64 scope link
valid_lft forever preferred_lft forever
7: enia6f1918d9e2@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether 96:f5:36:53:e9:55 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::94f5:36ff:fe53:e955/64 scope link
valid_lft forever preferred_lft forever
8: enia32d23ac2d1@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether 36:5e:34:a7:82:30 brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::345e:34ff:fea7:8230/64 scope link
valid_lft forever preferred_lft forever
9: cali5e7dde1e39e@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 3
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
[ec2-user@ip-10-254-109-184 ~]$
nsenter on the test container pid to get ip a
info:
[ec2-user@ip-10-254-109-184 ~]$ sudo nsenter -t 15715 -n ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
link/ether 9a:6d:db:06:74:cb brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.29.129/32 scope global eth0
valid_lft forever preferred_lft forever