To demonstrate the kubelet's eviction behaviour, I am trying to deploy a Kubernetes workload that will consume memory to the point that the kubelet evicts all BestEffort Pods due to memory pressure but does not kill my workload (or at least not before the BestEffort Pods).
My best attempt is below. It writes to two tmpfs volumes (since, by default, the limit of a tmpfs volume is half of the Node's total memory). The 100
comes from the fact that --eviction-hard=memory.available<100Mi
is set on the kubelet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fallocate
namespace: developer
spec:
selector:
matchLabels:
app: fallocate
template:
metadata:
labels:
app: fallocate
spec:
containers:
- name: alpine
image: alpine
command:
- /bin/sh
- -c
- |
count=1
while true
do
AVAILABLE_DISK_KB=$(df /cache-1 | grep /cache-1 | awk '{print $4}')
AVAILABLE_DISK_MB=$(( $AVAILABLE_DISK_KB / 1000 ))
AVAILABLE_MEMORY_MB=$(free -m | grep Mem | awk '{print $4}')
MINIMUM=$(( $AVAILABLE_DISK_MB > $AVAILABLE_MEMORY_MB ? $AVAILABLE_MEMORY_MB : $AVAILABLE_DISK_MB ))
fallocate -l $(( $MINIMUM - 100 ))MB /cache-1/$count
AVAILABLE_DISK_KB=$(df /cache-2 | grep /cache-2 | awk '{print $4}')
AVAILABLE_DISK_MB=$(( $AVAILABLE_DISK_KB / 1000 ))
AVAILABLE_MEMORY_MB=$(free -m | grep Mem | awk '{print $4}')
MINIMUM=$(( $AVAILABLE_DISK_MB > $AVAILABLE_MEMORY_MB ? $AVAILABLE_MEMORY_MB : $AVAILABLE_DISK_MB ))
fallocate -l $(( $MINIMUM - 100 ))MB /cache-2/$count
count=$(( $count+1 ))
sleep 1
done
resources:
requests:
memory: 2Gi
cpu: 100m
limits:
cpu: 100m
volumeMounts:
- name: cache-1
mountPath: /cache-1
- name: cache-2
mountPath: /cache-2
volumes:
- name: cache-1
emptyDir:
medium: Memory
- name: cache-2
emptyDir:
medium: Memory
The intention of this script is to use up memory to the point that Node memory usage is in the hard eviction threshhold boundary to cause the kubelet to start to evict. It evicts some BestEfforts Pods, but in most cases the workload is killed before all BestEffort Pods are evicted. Is there a better way of doing this?
I am running on GKE with cluster version 1.9.3-gke.0.
EDIT:
I also tried using simmemleak:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: simmemleak
namespace: developer
spec:
selector:
matchLabels:
app: simmemleak
template:
metadata:
labels:
app: simmemleak
spec:
containers:
- name: simmemleak
image: saadali/simmemleak
resources:
requests:
memory: 1Gi
cpu: 1m
limits:
cpu: 1m
But this workload keeps dying before any evictions. I think the issue is that it is being killed by the kernel before the kubelet has time to react.