I have a deployment with 2 replicas. I would like to specify that, when possible, pods should be load balanced between as many nodes/hostnames. So far, I have the following spec:
apiVersion: apps/v1
kind: Deployment
metadata:
name: topspin-apollo-backend-staging-dep
labels:
app: topspin-apollo-backend
env: staging
spec:
replicas: 2
selector:
matchLabels:
app: topspin-apollo-backend
env: staging
template:
metadata:
labels:
app: topspin-apollo-backend
env: staging
spec:
containers:
- name: topspin-apollo-backend
image: rwu1997/topspin-apollo-backend:latest
imagePullPolicy: Always
envFrom:
- secretRef:
name: topspin-apollo-backend-staging-secrets
ports:
- containerPort: 8000
imagePullSecrets:
- name: regcred
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: topspin-apollo-backend
env: staging
topologyKey: "kubernetes.io/hostname"
If I kubectl apply
this deployment from scratch, k8s correctly schedules a pod on each of the 2 nodes in the cluster (A and B). If I kill one of the nodes B, the corresponding pod is re-scheduled on the last remaining node A (as expected).
When I re-add another node C back to the cluster, the two pods remain scheduled on node A. This is expected as far as I know.
Is there a way to trigger the scheduler to re-balance the 2 pods amongst node A and C?
I've tried kubectl scale --replicas=4
, have two additional pods scheduled on node C, then kubectl scale --replicas=2
, but it seems to kill off the 2 most recently scheduled pods (instead of prioritizing the pod anti-affinity).
One method that works is to kubectl delete
the deployment then kubectl apply
, but this introduces downtime.
Another method is to kubectl scale --replicas=1
, then kubectl scale --replicas=2
, but it's less than ideal since there exists only 1 replica for a period of time.