How to pre-populate a ReadOnlyMany Persistent Volume

votes

I am trying to create a deployment in GKE that uses multiple replicas. I have some static data which I want to have available in every pod. This data will not be updated, no write is required.

I decided to use a PV with a corresponding PVC with the ReadOnlyMany storage class. The thing is, I do not know how to actually transfer my data to the volume - since it is read-only. I tried using

gcloud compute scp /local/path instance:/remote/path

but of course, I get a permission error. I then tried creating a new PV via the console. I attached it to a VM with

gcloud compute instances attach disk

mounted and formatted the disk, transfered my data, unmounted the disk, detached it from the VM and finally created a PVC following the documentation. I changed the storage class to ReadOnlyMany, the only difference.

But still, when I'm trying to scale my deployment to more than one replicas I get an error saying the disk is already attached to another node.

So, how can I create a volume that is to be used in ReadOnlyMany and populate the disk with data? Or is there a better approach since no write is required?

Thanks in advance

kubernetesgoogle-kubernetes-enginepersistent-storage

3 Answers

votes

Worked for me. Have you specified readOnly: true when using persistent volume claim in the Pod template?

volumes:
- name: my-volume
  persistentVolumeClaim:
    claimName: my-readonly-pvc
    readOnly: true

See details here https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks

votes

Hello Nikolaos,

The approach you are following depends heavily on your use case.

The approach you are following is very common when you are using a distributed file system as CEPH, GlusterFS or GCP Cloud Filestore or remote file systems as NFS.

When using distributed FS or remote FS the approach is:

1.- Create a PV with the AccessMode set to RWO and with the Reclaim Policy set to RETAIN.

2.- Create the PVC

3.- Attach the PVC to a POD

4.- Transfer the data to the volume via the POD.

5.- Delete the pod, the pvc and the pv.

6.- Create a new PV with the AccessMode set to ROX and with the Reclaim Policy set to RETAIN for EACH Deployment or POD you want to attach the data. This not applies if the POD replica number is greater than 1 because pod will attach the same PV.

7.- Create a PVC for each PV. The relationship PV and PVC is 1 : 1 8.- Attach the PVC to each POD or Deployment you want to use.

Your issue seems to be that you are trying to attach the same PV to multiple PVC and that is not allowed, the relationship PVC <--> PV is one-on-one.

Regarding your other question if there is a better approach, that depends heavily on your use case. Google Cloud Platform offers a lot of storage options [1]. For example, if you are using objects you can use Google Cloud Storage [2] instead of Persistent Disks.

[1] https://cloud.google.com/kubernetes-engine/docs/concepts/storage-overview

[2] https://cloud.google.com/filestore/docs/accessing-fileshares

votes

We can simplify a bit the whole process. On GKE you don't actually need to manually create a PV based on GCE Persistent Disk. All you need is to define proper PVC which can look as follows:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: webserver-content-claim
spec:
  accessModes: [ReadOnlyMany]
  resources:
    requests:
      storage: 5Gi

Keep in mind that you cannot define access modes in PVC in a sense of putting there any specific constraints. What you basically do is simply requesting a storage that supports this particular access mode. Note that it's in a form of a list which means you may provide many different access modes that you want your PV to support. I explained it more in detail in this answer. But the key point here is that by setting ReadOnlyMany access mode in PVC definition you only request for a volume which supports this type of access but it doesn't mean it doesn't support other modes.

If you don't specify readOnly: true in volumes section of your Pod template as @Ievgen Goichuk suggested in his answer, by default it is mounted in rw mode. Since GCE Persistent Disk doesn't support ReadWriteMany access mode, such volume cannot be mounted by other Pods, scheduled on different nodes once it is already mounted in rw mode by one Pod, scheduled on one particular node. Mounting it in rw mode by this Pod is possible because GCE Persistent Disk supports also ReadWriteOnce access mode, which according to the official docs menas "the volume can be mounted as read-write by a single node". That's why Pods scheduled on other nodes are unable to mount it.

But let's move on to the actual solution.

Once you create the above mentioned PVC, you'll see that the corresponding PV has also been created (kubectl get pv) and its STATUS is Bound.

Now we only need to pre-populate it somehow before we start using it in ReadOnlyMany access mode. I will share what works best for me.

If you've already uploaded your data on one of your Compute Engine instances, forming the node-pool of your worker nodes, you can skip the next step.

I assume you have gcloud installed on your local machine.

gcloud compute scp /local/path instance:/remote/path

is the correct way to achieve that. @Nikolaos Paschos, if you get the permission denied error, it probably means the /remote/path you defiend is some restricted directory that you don't have access to as non-root user. You'll see this error if you try to copy something from your local filesystem e.g. to /etc directory on the remote machine. The safest way is to copy your files to your home directory to which you have access to:

gcloud compute scp --recurse /home/<username>/data/* <instance-name>:~ --zone <zone-name>

Use --recurse option if you want to copy all the files and directories with their content from the source directory.

Once our data is uploaded to one of our worker nodes, we need to copy it to our newly created PersistentVolume. It can be done in a few different ways.

I decided to use for it a temporary Pod with local volume.

To make our data, already present on one of GKE worker nodes, available also to our temporary Pod, let's create the following:

storage-class-local.yaml:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: local-storage
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer

pv-local.yaml:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: local-pv
spec:
  capacity:
    storage: 10Gi
  volumeMode: Filesystem
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  storageClassName: local-storage
  local:
    path: /home/<username>
  nodeAffinity:
    required:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - <gke-node-name>

and pvc-local.yaml:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myclaim
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 10Gi
  storageClassName: local-storage

In the next step let's create our temporary Pod which will enable us to copy our data from node, mounted into Pod as a local volume, to PV based on GCE Persistent Disk. It's definion may look as follows:

apiVersion: v1
kind: Pod
metadata:
  name: mypod
spec:
  containers:
    - name: myfrontend
      image: nginx
      volumeMounts:
      - mountPath: "/mnt/source"
        name: local-volume
      - mountPath: "/mnt/destination"
        name: gce-pd-volume
  volumes:
    - name: local-volume
      persistentVolumeClaim:
        claimName: myclaim
    - name: gce-pd-volume
      persistentVolumeClaim:
        claimName: webserver-content-claim

When the Pod is up and running, we can attach to it by:

kubectl exec -ti mypod -- /bin/bash

And copy our files:

cp -a /mnt/source/* /mnt/destination/

Now we can delete our temporary pod, local pv and pvc. Our PersistentVolume is already pre-populated with data and can be moutned in ro mode.

In order to test it we can ran the following Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        volumeMounts:
        - mountPath: "/usr/share/nginx/html"
          name: webserver-content
      volumes:
      - name: webserver-content
        persistentVolumeClaim:
          claimName: webserver-content-claim
          readOnly: true ### don't forget about setting this option