1
votes

I have a Prometheus-server pod that uses 8Gi persistent block volume. The volume provider is rook-ceph.

The pod was in crashloopbackoff state because there is no more space available:

[root@node4 ~]# df -h | grep rbd

/dev/rbd0   8.0G  8.0G   36K 100% /var/lib/kubelet/plugins/ceph.rook.io/rook-ceph/mounts/pvc-80f98193-deae-11e9-a240-0025b50a01df

The pod needs more space, so I decided to resize the volume to 20Gi.

By following the documentation : https://kubernetes.io/blog/2018/07/12/resizing-persistent-volumes-using-kubernetes/

I edited the resources.requests.storage: 20Gi in persistent volume claim. And upgraded the helm release.

Now I can see that the PV has resized to 20Gi. But PVC still shows that it claims 8Gi.

$ kubectl get pvc -n prometheus

NAME                      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
prometheus-alertmanager   Bound    pvc-80f5eb1a-deae-11e9-a240-0025b50a01df   2Gi        RWO            rook-ceph-block   22d
prometheus-server         Bound    pvc-80f98193-deae-11e9-a240-0025b50a01df   8Gi        RWO            rook-ceph-block   22d

$ kubectl get pv

NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                STORAGECLASS      REASON   AGE
pvc-80f5eb1a-deae-11e9-a240-0025b50a01df   2Gi        RWO            Delete           Bound    prometheus/prometheus-alertmanager   rook-ceph-block            22d
pvc-80f98193-deae-11e9-a240-0025b50a01df   20Gi       RWO            Delete           Bound    prometheus/prometheus-server         rook-ceph-block            22d
pvc-fb73b383-deb2-11e9-a240-0025b50a01df   10Gi       RWO            Delete           Bound    grafana/grafana                      rook-ceph-block            22d

PVC description says:

Conditions:
  Type                      Status  LastProbeTime                     LastTransitionTime                Reason  Message
  ----                      ------  -----------------                 ------------------                ------  -------
  FileSystemResizePending   True    Mon, 01 Jan 0001 00:00:00 +0000   Thu, 17 Oct 2019 15:49:05 +0530           Waiting for user to (re-)start a pod to finish file system resize of volume on node.

Then I deleted the pod to restart it.

But pod is still in crashloopbackoff state. Pod description says:

 Warning  FailedMount  2m17s (x2 over 2m17s)  kubelet, node4     MountVolume.SetUp failed for volume "pvc-80f98193-deae-11e9-a240-0025b50a01df" : mount command failed, status: Failure, reason: Rook: Mount volume failed: failed to attach volume pvc-80f98193-deae-11e9-a240-0025b50a01df for pod prometheus/prometheus-server-756c8495ff-wtx84. Volume is already attached by pod prometheus/prometheus-server-756c8495ff-hcd85. Status Running

When listed the pods, i can see only the new one prometheus-server-756c8495ff-wtx84 ( not the old pod prometheus-server-756c8495ff-hcd85 ):

$ kubectl get pods -n prometheus

NAME                                            READY   STATUS             RESTARTS   AGE
prometheus-alertmanager-6f756695d5-wvgr7        2/2     Running            0          22d
prometheus-kube-state-metrics-67cfbbd9d-bwx4w   1/1     Running            0          22d
prometheus-node-exporter-444bz                  1/1     Running            0          22d
prometheus-node-exporter-4hjr9                  1/1     Running            0          22d
prometheus-node-exporter-8plk7                  1/1     Running            0          22d
prometheus-node-exporter-pftf6                  1/1     Running            0          22d
prometheus-node-exporter-prndk                  1/1     Running            0          22d
prometheus-node-exporter-rchtg                  1/1     Running            0          22d
prometheus-node-exporter-xgmzs                  1/1     Running            0          22d
prometheus-pushgateway-77744d999c-5ndlm         1/1     Running            0          22d
prometheus-server-756c8495ff-wtx84              1/2     CrashLoopBackOff   5          4m31s

How can I solve this issue?

EDIT :

The strategy of the deployment is:

StrategyType:           RollingUpdate
RollingUpdateStrategy:  1 max unavailable, 1 max surge

I can see that even if kubectl get pv shows pv has 20Gi capacity, the actual rbd block of rook-ceph has 8Gi size only:

[root@rook-ceph-operator-775cf575c5-dfpql /]# rbd info replicated-metadata-pool/pvc-80f98193-deae-11e9-a240-0025b50a01df

rbd image 'pvc-80f98193-deae-11e9-a240-0025b50a01df':
        size 8 GiB in 2048 objects
        order 22 (4 MiB objects)
        snapshot_count: 0
        id: 434b1922b4b40a
        data_pool: ec-data-pool
        block_name_prefix: rbd_data.1.434b1922b4b40a
        format: 2
        features: layering, data-pool
        op_features:
        flags:
        create_timestamp: Tue Sep 24 09:34:28 2019
        access_timestamp: Tue Sep 24 09:34:28 2019
        modify_timestamp: Tue Sep 24 09:34:28 2019

The storageclass.yaml is:

$ kubectl get sc -n prometheus -o yaml

apiVersion: v1
items:
- allowVolumeExpansion: true
  apiVersion: storage.k8s.io/v1
  kind: StorageClass
  metadata:
    creationTimestamp: "2019-08-01T11:27:31Z"
    name: rook-ceph-block
    resourceVersion: "15172025"
    selfLink: /apis/storage.k8s.io/v1/storageclasses/rook-ceph-block
    uid: 59e3b081-b44f-11e9-a240-0025b50a01df
  parameters:
    blockPool: replicated-metadata-pool
    clusterNamespace: rook-ceph
    dataBlockPool: ec-data-pool
    fstype: xfs
  provisioner: ceph.rook.io/block
  reclaimPolicy: Delete
  volumeBindingMode: Immediate
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
1
Can you post yaml for your StorageClass?Crou
@Crou I edited the question. Please have a look.AnjanaDyna

1 Answers

0
votes

You can try to resize ext4 format manually. This is opened issues (https://github.com/rook/rook/issues/3133)