There was a pod named n404-neo4j-core-1
running on k8s-slave2. After k8s-slave2 was turned off, the pod was stuck with the Terminating
.
I was expecting the pod to be deleted and a new pod be created on another node. If this problem is not resolved, the neo4j cluster failed to keep HA.
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
n404-neo4j-core-0 1/1 Running 0 3d19h *** k8s-node1 <none> <none>
n404-neo4j-core-1 1/1 Terminating 0 78m *** k8s-slave2 <none> <none>
kubectl describe pod n404-neo4j-core-1
Name: n404-neo4j-core-1
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: k8s-slave2/10.176.6.67
Start Time: Mon, 01 Jun 2020 23:53:13 -0700
Labels: app.kubernetes.io/component=core
app.kubernetes.io/instance=n404
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=neo4j
controller-revision-hash=n404-neo4j-core-67484bd88
helm.sh/chart=neo4j-4.0.4-1
statefulset.kubernetes.io/pod-name=n404-neo4j-core-1
Annotations: <none>
Status: Terminating (lasts 21m)
Termination Grace Period: 30s
IP: 10.36.0.1
Controlled By: StatefulSet/n404-neo4j-core
Containers:
n404-neo4j:
Container ID: docker://a045d7747678ca62734800d153d01f634b9972b527289541d357cbc27456bf7b
Image: neo4j:4.0.4-enterprise
Image ID: docker-pullable://neo4j@sha256:714d83e56a5db61eb44d65c114720f8cb94b06cd044669e16957aac1bd1b5c34
Ports: 5000/TCP, 7000/TCP, 6000/TCP, 7474/TCP, 7687/TCP, 3637/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
Command:
/bin/bash
-c
export core_idx=$(hostname | sed 's|.*-||')
# Processes key configuration elements and exports env vars we need.
. /helm-init/init.sh
# We advertise the discovery-lb addresses (see discovery-lb.yaml) because
# it is for internal cluster comms and is limited to private ports.
export DISCOVERY_HOST="discovery-n404-neo4j-${core_idx}.default.svc.cluster.local"
export NEO4J_causal__clustering_discovery__advertised__address="$DISCOVERY_HOST:5000"
export NEO4J_causal__clustering_transaction__advertised__address="$DISCOVERY_HOST:6000"
export NEO4J_causal__clustering_raft__advertised__address="$DISCOVERY_HOST:7000"
echo "Starting Neo4j CORE $core_idx on $HOST"
exec /docker-entrypoint.sh "neo4j"
State: Running
Started: Mon, 01 Jun 2020 23:53:14 -0700
Ready: True
Restart Count: 0
Liveness: tcp-socket :7687 delay=300s timeout=2s period=10s #success=1 #failure=3
Readiness: tcp-socket :7687 delay=120s timeout=2s period=10s #success=1 #failure=3
Environment Variables from:
n404-neo4j-common-config ConfigMap Optional: false
n404-neo4j-core-config ConfigMap Optional: false
Environment:
NEO4J_SECRETS_PASSWORD: <set to the key 'neo4j-password' in secret 'n404-neo4j-secrets'> Optional: false
Mounts:
/data from datadir (rw)
/helm-init from init-script (rw)
/plugins from plugins (rw)
/var/run/secrets/kubernetes.io/serviceaccount from n404-neo4j-sa-token-jp7g9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady True
PodScheduled True
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: datadir-n404-neo4j-core-1
ReadOnly: false
init-script:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: n404-init-script
Optional: false
plugins:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
n404-neo4j-sa-token-jp7g9:
Type: Secret (a volume populated by a Secret)
SecretName: n404-neo4j-sa-token-jp7g9
Optional: false
QoS Class: BestEffort
Node-Selectors: svc=neo4j
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s