How does k8s service route the traffic to mulitiple endpoints

Question

In Kubernetes, we have ClusterIp/Nodeport/LoadBalancer as the service to expose pods. When there are multiple endpoints binds to one serivce (like deployment), then what is the policy Kubernetes route the traffic to one of the endpoints? Will it always try to respect a load balancing policy, or randomly selection?

PoweredByOrange PoweredByOrange · Accepted Answer · 2019-02-25T16:42:53

Kubernetes uses iptables to distribute traffic across a set of pods, as officially explained by kubernetes.io. Basically what happens is when you create a kind: service object, K8s creates a virtual ClusterIP and instructs the kube-proxy daemonset to update iptables on each node so that requests matching that virtual IP will get load balanced across a set of pod IPs. The word "virtual" here means that ClusterIPs, unlike pod IPs, are not real IP addresses allocated by a network interface, and are merely used as a "filter" to match traffic and forward them to the right destination.

Kubernetes documentation says the load balancing method by default is round robin, but this is not entirely accurate. If you look at iptables on any of the worker nodes, you can see that for a given service foo with ClusterIP of 172.20.86.5 and 3 pods, the [overly simplified] iptables rules look like this:

$ kubectl get service foo

NAME      TYPE       CLUSTER-IP    EXTERNAL-IP   PORT(S)         AGE
foo       ClusterIP  172.20.86.5   <none>        443:30937/TCP   12m

Chain KUBE-SERVICES (2 references)
target     prot opt source               destination         
KUBE-SVC-4NIQ26WEGJLLPEYD  tcp  --  anywhere             172.20.86.5          /* default/foo:https cluster IP */ tcp dpt:https

This KUBE-SERVICES chain rule looks for all traffic whose destination is 172.20.86.5, and applies rules defined in another chain called KUBE-SVC-4NIQ26WEGJLLPEYD:

Chain KUBE-SVC-4NIQ26WEGJLLPEYD (2 references)
target     prot opt source               destination         
KUBE-SEP-4GQBH7D5EV5ANHLR  all  --  anywhere             anywhere             /* default/foo:https */ statistic mode random probability 0.33332999982
KUBE-SEP-XMNJYETXA5COSMOZ  all  --  anywhere             anywhere             /* default/foo:https */ statistic mode random probability 0.50000000000
KUBE-SEP-YGQ22DTWGVO4D4MM  all  --  anywhere             anywhere             /* default/foo:https */

This chain uses statistic mode random probability to randomly send traffic to one of the three chains defined (since I have three pods, I have three chains here each with 33.3% chance of being chosen to receive traffic). Each one of these chains is the final rule in sending the traffic to the backend pod IP. For example looking at the first one:

Chain KUBE-SEP-4GQBH7D5EV5ANHLR (1 references)
target     prot opt source               destination         
DNAT       tcp  --  anywhere             anywhere             /* default/foo:https */ tcp to:10.100.1.164:12345

the DNAT directive forwards packets to IP address 10.100.1.164 (real pod IP) and port 12345 (which is what foo listens on). The other two chains (KUBE-SEP-XMNJYETXA5COSMOZ and KUBE-SEP-YGQ22DTWGVO4D4MM) are similar except each will have a different IP address.

Similarly, if your service type is NodePort, Kubernetes assigns a random port (from 30000-32767 by default) on the node. What's interesting here is that there is no process on the worker node actively listening on this port - instead, this is yet another iptables rule to match traffic and send it to the right set of pods:

Chain KUBE-NODEPORTS (1 references)
target     prot opt source               destination         
KUBE-SVC-4NIQ26WEGJLLPEYD  tcp  --  anywhere             anywhere             /* default/foo:https */ tcp dpt:30937

This rule matches inbound traffic going to port 30937 (tcp dpt:30937), and forwards it to chain KUBE-SVC-4NIQ26WEGJLLPEYD. But guess what: KUBE-SVC-4NIQ26WEGJLLPEYD is the same exact chain that cluster ip 172.20.86.5 matches on and sends traffic to, as shown above.

How does k8s service route the traffic to mulitiple endpoints

3 Answers