1
votes

I am trying to use Velero as backup and disaster recovery tool in Google Cloud Platform with multiple GCP regions (for example: europe-north1 and europe-west4) for GKE private clusters. I was able to successfully backup and restore using velero in the same region (taking backup of gke cluster in europe-north1 and restoring to another gke cluster in europe-north1) without any issues. This works fine because the snapshots are stored in the same region (europe-north1) for both the clusters.

But I would like to use velero as the disaster recovery tool for the GKE clusters so that I can take backup of GKE clusters in europe-north1 region and restore them to europe-west4 region. On further research, I found that by enabling CSI plugin support for velero, I would be able to achieve the same. So I have folowed the guidelines to use CSI plugin with velero but I'm still not able to restore the persistent disk PVCs to another region. The snapshots are taken as multi-regional (for example, eu). But when I run the velero restore command, pod creation (I am using wordpress and mysql pods as examples) has been in 'pending' state.

kubectl describe pod (mysql and wordpress) gives the following error:

   Normal   NotTriggerScaleUp  72s (x31 over 6m12s)  cluster-autoscaler  pod didn't trigger scale-up (it wouldn't fit if a new node is added):
  Warning  FailedScheduling   60s (x11 over 6m14s)  default-scheduler   0/4 nodes are available: 4 node(s) had volume node affinity conflict.

This error is because the google persistent disks which are created by PVC are in a different region than the GKE Cluster. Checking the disks, I can see that the restore command created two disks but they are still created in europe-north1 region (primary gke cluster region) instead of being created in europe-west4 region where the secondary gke cluster resides.

As this is a new feature for velero (CSI Plugin), I couldn't find any documentation for using it in GCP (there is a document showing CSI implementation with Azure Disks)

Minimum requirement for CSI Plugin to work with velero backups:

kubernetes version : 1.17
velero version: 1.4.2

Velero Version

velero version
Client:
        Version: v1.4.2
        Git commit: 56a08a4d695d893f0863f697c2f926e27d70c0c5
Server:
        Version: v1.4.2

GKE Cluster kubernetes version (GKE cluster created with GcePersistentDiskCsiDriver=ENABLED addon) :

v1.17.9-gke.600 

Primary Region:

europe-north1

Secondary (DR) Region:

europe-west4

Command used to install velero server (with CSI plugin enabled):

velero install \ 
--features=EnableCSI \ 
--provider=gcp \ 
--image=gcr.io/$project/velero:v1.4.2  \ 
--plugins=gcr.io/$project/velero-plugin-for-gcp:v1.1.0,gcr.io/$project/velero-plugin-for-csi:v0.1.0 \ 
--bucket=$storagebucket \ 
--secret-file=$HOME/./velero-backup-storage-sa-key.json

Other documents that I have referred for this:

https://velero.io/docs/v1.4/csi/#installing-velero-with-csi-support

https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/gce-pd-csi-driver

Any help would be much appreciated.

1
I am pretty sure you will have to copy the volume snapshots to the target region and update VolumeSnapshotContent accordingly. I haven't tried it out though. Some ref here - github.com/kubernetes-sigs/…Faheem
@Faheem as I am using velero with csi plugin, my understanding is that velero should be able to automate this process (and it is creating multi-regional snapshots which means that it can be used as PV in the secondary region)srsn
Can you share some docs that refer to Velero creating snapshots in different regions? It’s my understanding that it will create the snapshot in the same region as the original volume.Faheem
This is mentioned in the github issue of velero that it is possible with new version of velero with CSI plugin in GCP. But I'm not able to find proper documentation for this. github.com/vmware-tanzu/velero/issues/…srsn
That would be a capability of GCP and not Velero, however, that shouldn’t matter. Thanks for letting me know.Faheem

1 Answers

1
votes

In GCP, VolumeSnapshots are multi-region by default (but single geo - us, europe or asia). I have successfully tested a cross region restore for a StatefulSet from us-central1 to us-east4 in GCP. But one caveat is that I used Regional Disks and created region/zone specific StorageClasses with allowedTopologies configured. Here is my restore SC:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  name: regional-pd-ssd-csi-storageclass
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
  replication-type: regional-pd
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
  - key: topology.gke.io/zone
    values:
    - us-east4-b
    - us-east4-c

And here is the SC for the backup region:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
  name: regional-pd-ssd-csi-storageclass
provisioner: pd.csi.storage.gke.io
parameters:
  type: pd-ssd
  #replication-type is one of none or regional-pd, defaults to none (zonal PD)
  replication-type: regional-pd
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
allowedTopologies:
- matchLabelExpressions:
  - key: topology.gke.io/zone
    values:
    - us-central1-a
    - us-central1-b

It may be possible to use allowedTopologies without regional disks but that parameters is not supported by the EBS CSI driver (as far as I know).