0
votes

I'm trying to install a CockroachDB Helm chart on a 2 node Kubernetes cluster using this command:

helm install my-release --set statefulset.replicas=2 stable/cockroachdb

I have already created 2 persistent volumes:

NAME      CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                          STORAGECLASS   REASON   AGE
pv00001   100Gi      RWO            Recycle          Bound    default/datadir-my-release-cockroachdb-0                           11m
pv00002   100Gi      RWO            Recycle          Bound    default/datadir-my-release-cockroachdb-1                           11m

I'm getting a weird error and I'm new to Kubernetes so I'm not sure what I'm doing wrong. I've tried creating a StorageClass and using it with my PVs but then the CockroachDB PVCs won't bind to them. I suspect there may be something wrong with my PV setup?

I've tried using kubectl logs but the only error I'm seeing is this:

standard_init_linux.go:211: exec user process caused "exec format error"

and the pods are crashing over and over:

NAME                                    READY   STATUS             RESTARTS   AGE
my-release-cockroachdb-0            0/1     Pending            0          11m
my-release-cockroachdb-1            0/1     CrashLoopBackOff   7          11m
my-release-cockroachdb-init-tfcks   0/1     CrashLoopBackOff   5          5m29s

Any idea why the pods are crashing?

Here's kubectl describe for the init pod:

Name:         my-release-cockroachdb-init-tfcks
Namespace:    default
Priority:     0
Node:         axon/192.168.1.7
Start Time:   Sat, 04 Apr 2020 00:22:19 +0100
Labels:       app.kubernetes.io/component=init
              app.kubernetes.io/instance=my-release
              app.kubernetes.io/name=cockroachdb
              controller-uid=54c7c15d-eb1c-4392-930a-d9b8e9225a45
              job-name=my-release-cockroachdb-init
Annotations:  <none>
Status:       Running
IP:           10.44.0.1
IPs:
  IP:           10.44.0.1
Controlled By:  Job/my-release-cockroachdb-init
Containers:
  cluster-init:
    Container ID:  docker://82a062c6862a9fd5047236feafe6e2654ec1f6e3064fd0513341a1e7f36eaed3
    Image:         cockroachdb/cockroach:v19.2.4
    Image ID:      docker-pullable://cockroachdb/cockroach@sha256:511b6d09d5bc42c7566477811a4e774d85d5689f8ba7a87a114b96d115b6149b
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      while true; do initOUT=$(set -x; /cockroach/cockroach init --insecure --host=my-release-cockroachdb-0.my-release-cockroachdb:26257 2>&1); initRC="$?"; echo $initOUT; [[ "$initRC" == "0" ]] && exit 0; [[ "$initOUT" == *"cluster has already been initialized"* ]] && exit 0; sleep 5; done
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Sat, 04 Apr 2020 00:28:04 +0100
      Finished:     Sat, 04 Apr 2020 00:28:04 +0100
    Ready:          False
    Restart Count:  6
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-cz2sn (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-cz2sn:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-cz2sn
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  <unknown>             default-scheduler  Successfully assigned default/my-release-cockroachdb-init-tfcks to axon
  Normal   Pulled     5m9s (x5 over 6m45s)  kubelet, axon      Container image "cockroachdb/cockroach:v19.2.4" already present on machine
  Normal   Created    5m8s (x5 over 6m45s)  kubelet, axon      Created container cluster-init
  Normal   Started    5m8s (x5 over 6m44s)  kubelet, axon      Started container cluster-init
  Warning  BackOff    92s (x26 over 6m42s)  kubelet, axon      Back-off restarting failed container
3
The most important message is the Pod logs. It shows that the arch of the cockroach image doesn't match to the nodes. Run kubectl get po -o wide to get nodes where cockroach runs and check their arch. - kitt
@kitt Please add as an answer with details and I will accept - Alex W

3 Answers

2
votes

When Pods get crashed, the most important thing to troubleshoot is their descriptions(kubectl describe) and logs.

Logs of the failed Pod show that the arch of the cockroach image doesn't match to the nodes.

Run kubectl get po -o wide to get nodes where cockroach runs and check their arch.

1
votes

A 2-node CockroachDB cluster is an anti-pattern. You need 3 or more nodes to avoid data or cluster-wide unavailability when a single node fails. Consider checking out these videos explaining how data in CockroachDB is organized and then how the nodes in a cluster work together to keep data available in the face of node failure.

0
votes

Only if you have 3 nodes (or more), you will not risk losing data if any of the notes gets corrupted. Apart from it, its easier to explain how to do it right, than finding out what went wrong, and to find out what went wrong, one must go through the logs.

If you attach the log, I can take a look.

I also wrote a detailed guide that may address the "doing it right" part of my answer. I elaborated even more about the entire process here.