In Kubernetes, we have ClusterIp/Nodeport/LoadBalancer as the service to expose pods.
When there are multiple endpoints binds to one serivce (like deployment), then what is the policy Kubernetes route the traffic to one of the endpoints? Will it always try to respect a load balancing
policy, or randomly selection?
3 Answers
Kubernetes uses iptables to distribute traffic across a set of pods, as officially explained by kubernetes.io. Basically what happens is when you create a kind: service
object, K8s creates a virtual ClusterIP and instructs the kube-proxy daemonset to update iptables on each node so that requests matching that virtual IP will get load balanced across a set of pod IPs. The word "virtual" here means that ClusterIPs, unlike pod IPs, are not real IP addresses allocated by a network interface, and are merely used as a "filter" to match traffic and forward them to the right destination.
Kubernetes documentation says the load balancing method by default is round robin, but this is not entirely accurate. If you look at iptables on any of the worker nodes, you can see that for a given service foo
with ClusterIP of 172.20.86.5 and 3 pods, the [overly simplified] iptables rules look like this:
$ kubectl get service foo
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
foo ClusterIP 172.20.86.5 <none> 443:30937/TCP 12m
Chain KUBE-SERVICES (2 references)
target prot opt source destination
KUBE-SVC-4NIQ26WEGJLLPEYD tcp -- anywhere 172.20.86.5 /* default/foo:https cluster IP */ tcp dpt:https
This KUBE-SERVICES
chain rule looks for all traffic whose destination
is 172.20.86.5, and applies rules defined in another chain called KUBE-SVC-4NIQ26WEGJLLPEYD
:
Chain KUBE-SVC-4NIQ26WEGJLLPEYD (2 references)
target prot opt source destination
KUBE-SEP-4GQBH7D5EV5ANHLR all -- anywhere anywhere /* default/foo:https */ statistic mode random probability 0.33332999982
KUBE-SEP-XMNJYETXA5COSMOZ all -- anywhere anywhere /* default/foo:https */ statistic mode random probability 0.50000000000
KUBE-SEP-YGQ22DTWGVO4D4MM all -- anywhere anywhere /* default/foo:https */
This chain uses statistic mode random probability
to randomly send traffic to one of the three chains defined (since I have three pods, I have three chains here each with 33.3% chance of being chosen to receive traffic). Each one of these chains is the final rule in sending the traffic to the backend pod IP. For example looking at the first one:
Chain KUBE-SEP-4GQBH7D5EV5ANHLR (1 references)
target prot opt source destination
DNAT tcp -- anywhere anywhere /* default/foo:https */ tcp to:10.100.1.164:12345
the DNAT
directive forwards packets to IP address 10.100.1.164 (real pod IP) and port 12345 (which is what foo
listens on). The other two chains (KUBE-SEP-XMNJYETXA5COSMOZ
and KUBE-SEP-YGQ22DTWGVO4D4MM
) are similar except each will have a different IP address.
Similarly, if your service type is NodePort
, Kubernetes assigns a random port (from 30000-32767 by default) on the node. What's interesting here is that there is no process on the worker node actively listening on this port - instead, this is yet another iptables rule to match traffic and send it to the right set of pods:
Chain KUBE-NODEPORTS (1 references)
target prot opt source destination
KUBE-SVC-4NIQ26WEGJLLPEYD tcp -- anywhere anywhere /* default/foo:https */ tcp dpt:30937
This rule matches inbound traffic going to port 30937 (tcp dpt:30937
), and forwards it to chain KUBE-SVC-4NIQ26WEGJLLPEYD
. But guess what: KUBE-SVC-4NIQ26WEGJLLPEYD
is the same exact chain that cluster ip 172.20.86.5 matches on and sends traffic to, as shown above.
This might help https://kubernetes.io/docs/concepts/services-networking/#proxy-mode-ipvs
In a word, if you want to manage different approaches for the load balancing, you have to put your k8s proxy into ipvs mode and pick one of the below approaches: rr: round-robin
lc: least connection
dh: destination hashing
sh: source hashing
sed: shortest expected delay
nq: never queue