1
votes

I am running a GKE cluster which has multiple pods trying to access a shared volume. Since GC persistent disks do not allow ReadWriteMany access, I set up an NFS server in the cluster (in the same way done by many samples like this) to allow it. I am running both a production and development environment on this cluster in different namespaces, but since both of these environments run the same application, they both need their own file system.

Currently, the solution to this has been to set up 2 NFS servers in the same way (one for prod and one for dev). It seems like when the pods that mount the volume using the NFS server are on the same node as the NFS server itself, they are unable to mount (the error is "Unable to attach or mount volumes [...]: timed out waiting for the condition"). However, this seems to only be occuring for the dev environment as the prod environment has not had any problems. Currently, both NFS servers have been allocated to the same node, which may also be contributing to the problem, but I'm not sure.

I've been trying to figure out whether there is a problem with having 2 NFS servers in this way, or if there is some problem with trying to connect pods to an NFS server running on the same node, but to no avail so far. Any ideas what could cause the problem?

Logs in nfs server pods (same for both dev and prod):

nfs-dev-server  Oct 30, 2020, 3:57:23 PM    NFS started 
nfs-dev-server  Oct 30, 2020, 3:57:22 PM    exportfs: / does not support NFS export 
nfs-dev-server  Oct 30, 2020, 3:57:22 PM    Starting rpcbind    
nfs-dev-server  Oct 30, 2020, 3:57:22 PM    rpcinfo: can't contact rpcbind: : RPC: Unable to receive; errno = Connection refused    
nfs-dev-server  Oct 30, 2020, 3:57:21 PM    Serving /   
nfs-dev-server  Oct 30, 2020, 3:57:21 PM    Serving /exports
1
Are there any logs in NFS-server pods? Could you share the output from both pods if anything appears, or logs from faulty nfs pod?Mariusz K.
I've added the logs for the NFS server pods, but they generally seem to have been working fine despite these errors (at least in terms of NFS, just not the above issue). The NFS pods are not the ones failing, its just that nothing is able to mount to them from the same node.Gordie

1 Answers

0
votes

I reproduced your issue by following tutorial you have linked and came across the same issue and the reason for that was that I haven't changed the server IP when creating 2nd PersistentVolume

This is how I have deployed 2 NFS Servers in 2 separate namespaces in GKE- version: 1.17.12-gke.2502

Create 2 disks gcloud compute disks create --size=10GB --zone=us-east1-b gce-nfs-disk gcloud compute disks create --size=10GB --zone=us-east1-b gce-nfs-disk2 Create NFS deployment and Service in dev namespace

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-server
  namespace: dev
spec:
  replicas: 1
  selector:
    matchLabels:
      role: nfs-server
  template:
    metadata:
      labels:
        role: nfs-server
    spec:
      containers:
      - name: nfs-server
        image: gcr.io/google_containers/volume-nfs:0.8
        ports:
          - name: nfs
            containerPort: 2049
          - name: mountd
            containerPort: 20048
          - name: rpcbind
            containerPort: 111
        securityContext:
          privileged: true
        volumeMounts:
          - mountPath: /exports
            name: mypvc
      volumes:
        - name: mypvc
          gcePersistentDisk:
            pdName: gce-nfs-disk
            fsType: ext4
---
apiVersion: v1
kind: Service
metadata:
  name: nfs-server
  namespace: dev
spec:
  # clusterIP: 10.3.240.20
  ports:
    - name: nfs
      port: 2049
    - name: mountd
      port: 20048
    - name: rpcbind
      port: 111
  selector:
    role: nfs-server

After NFS deployment is successfully create check ClusterIP of service kubectl get svc -n dev and add it to nfs: server: <CLUSTER_IP>

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs
spec:
  storageClassName: standard
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteMany
  nfs:
    server: <CLUSTER_IP_OF_SVC>
    path: "/"
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: nfs
  namespace: dev
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-nginx
  namespace: dev
spec:
  replicas: 6
  selector:
    matchLabels:
      name: nfs-nginx
  template:
    metadata:
      labels:
        name: nfs-nginx
    spec:
      containers:
      - image: nginx
        imagePullPolicy: Always
        name: busybox
        volumeMounts:
          # name must match the volume name below
          - name: nfs
            mountPath: "/mnt"
      volumes:
      - name: nfs
        persistentVolumeClaim:
          claimName: nfs

Check and confirm everything is up and running:

kubectl get pods -n dev -w
NAME                          READY   STATUS              RESTARTS   AGE
nfs-nginx-587f8bd757-4gm8z    1/1     Running             0          6s
nfs-nginx-587f8bd757-6lh4l    1/1     Running             0          6s
nfs-nginx-587f8bd757-czr4r    0/1     Running             0          6s
nfs-nginx-587f8bd757-m5vph    1/1     Running             0          6s
nfs-nginx-587f8bd757-wqcff    1/1     Running             0          6s
nfs-nginx-587f8bd757-xqnf9    1/1     Running             0          6s
nfs-server-5f58f8d764-gjjnf   1/1     Running             0          3m14s

Repeat the same for prod namespace, but remember to change IP address of nfs-service in prod namespace and change it accordingly in PV manifest. After deploying this the result:

$ kubectl get pods -n prod
NAME                           READY   STATUS    RESTARTS   AGE
nfs-nginx2-5d75567b95-7n7gk    1/1     Running   0          6m25s
nfs-nginx2-5d75567b95-gkqww    1/1     Running   0          6m25s
nfs-nginx2-5d75567b95-gt96p    1/1     Running   0          6m25s
nfs-nginx2-5d75567b95-hf9j7    1/1     Running   0          6m25s
nfs-nginx2-5d75567b95-k2jdv    1/1     Running   0          6m25s
nfs-nginx2-5d75567b95-q457q    1/1     Running   0          6m25s
nfs-server2-8654b89f48-bp9lv   1/1     Running   0          7m19s