Is it possible to have ephemeral, on-disk pod storage on Google Kubernetes Engine?

Question

I would like the containers in my pod to share a volume for temporary (cached) data. I don't mind if the data is lost when the pod terminates (in fact, I want the data deleted and space reclaimed).

The kubernetes docs make an emptyDir sound like what I want:

An emptyDir volume is first created when a Pod is assigned to a Node, and exists as long as that Pod is running on that node

.. and

By default, emptyDir volumes are stored on whatever medium is backing the node - that might be disk or SSD or network storage, depending on your environment. However, you can set the emptyDir.medium field to "Memory" to tell Kubernetes to mount a tmpfs (RAM-backed filesystem) for you instead

That sounds like the default behaviour is to store the volume on disk, unless I explicitly request in-memory.

However, if I create the following pod on my GKE cluster:

apiVersion: v1
kind: Pod
metadata:
  name: alpine
spec:
  containers:
  - name: alpine
    image: alpine:3.7
    command: ["/bin/sh", "-c", "sleep 60m"]
    volumeMounts:
      - name: foo
        mountPath: /foo
  volumes:
  - name: foo
    emptyDir: {}

.. and then open a shell on the pod and write a 2Gb file to the volume:

kubectl exec -it alpine -- /bin/sh
$ cd foo/
$ dd if=/dev/zero of=file.txt count=2048 bs=1048576

Then I can see in the GKE web console that the RAM usage of the container has increased by 2Gb:

It looks to me like the GKE stores emptyDir volumes in memory by default. The workload I plan to run needs plenty of memory, so I'd like the emptyDir volume to be backed by disk - is that possible? The GKE storage docs don't have much to say on the issue.

An alternative approach might be to use a local SSD for my cached data, however if I mount them as recommended in the GKE docs they're shared by all pods running on the same node and the data isn't cleaned up on pod termination, which doesn't meet my goals of automatically managed resources.

Mounts

Here's the output of df -h inside the container:

# df -h
Filesystem                Size      Used Available Use% Mounted on
overlay                  96.9G     26.2G     70.7G  27% /
overlay                  96.9G     26.2G     70.7G  27% /
tmpfs                     7.3G         0      7.3G   0% /dev
tmpfs                     7.3G         0      7.3G   0% /sys/fs/cgroup
/dev/sda1                96.9G     26.2G     70.7G  27% /foo
/dev/sda1                96.9G     26.2G     70.7G  27% /dev/termination-log
/dev/sda1                96.9G     26.2G     70.7G  27% /etc/resolv.conf
/dev/sda1                96.9G     26.2G     70.7G  27% /etc/hostname
/dev/sda1                96.9G     26.2G     70.7G  27% /etc/hosts
shm                      64.0M         0     64.0M   0% /dev/shm
tmpfs                     7.3G     12.0K      7.3G   0% /run/secrets/kubernetes.io/serviceaccount
tmpfs                     7.3G         0      7.3G   0% /proc/kcore
tmpfs                     7.3G         0      7.3G   0% /proc/timer_list
tmpfs                     7.3G         0      7.3G   0% /proc/sched_debug
tmpfs                     7.3G         0      7.3G   0% /sys/firmware

The View from the Node

I discovered it's possible to ssh into the node instance, and I was able to find the 2Gb file on the node filesystem:

root@gke-cluster-foo-pool-b-22bb9925-xs5p:/# find . -name file.txt
./var/lib/kubelet/pods/79ad1aa4-4441-11e8-af32-42010a980039/volumes/kubernetes.io~empty-dir/foo/file.txt

Now that I can see it is being written to the underlying filesystem, I'm wondering if maybe the RAM usage I'm seeing in the GKE web UI is the linux filesystem cache or similar, rather than the file being stored in a RAM disk?

I thought it might be a difference in behaviour between COS and Ubuntu nodes, but I've observed this on both node types. — James Healy
Can you list your mounts in the container (df-h)? What filesystem is shown? — hexacyanide
I append some extra details to the end of the post, now I'm wondering if maybe the RAM usage I'm seeing in the GKE web UI is the linux filesystem cache or similar, rather than the file being stored in a RAM disk? — James Healy

hexacyanide hexacyanide · Accepted Answer · 2018-04-20T08:05:55

From the mount information you've supplied, the emptyDir volume is mounted on a drive partition, so it's working as intended, and isn't mounted in memory. It's likely that the memory usage you see is due to the filesystem buffer cache, so with sufficient memory pressure, it'd eventually get written to the disk. However, given that you have so much free memory, it's likely that the system saw no need to do so immediately.

If you have more doubts, give sync or echo 3 > /proc/sys/vm/drop_caches a go on the machines to flush filesystem information to disk. You should see a change in memory usage.

Is it possible to have ephemeral, on-disk pod storage on Google Kubernetes Engine?

Mounts

The View from the Node

1 Answers