elasticsearch failed to resolve host [elasticsearch-discovery]

Question

I deploy elastichsearch to my AWS EKS for logging purpose, using stable/elasticsearch chart, with this command:

helm install stable/elasticsearch --namespace logging --name elasticsearch --set data.terminationGracePeriodSeconds=0

once installed, all pods running under logging are Running but not in ready status

NAME                                    READY   STATUS    RESTARTS   AGE
elasticsearch-client-64bb574bff-85lp9   0/1     Running   0          41s
elasticsearch-client-64bb574bff-t4h6r   0/1     Running   0          44m
elasticsearch-data-0                    0/1     Pending   0          44m
elasticsearch-master-0                  0/1     Pending   0          44m

And I got this warnings from elasticsearch pod logs

[elasticsearch-client-647c67f49d-npjp4] not enough master nodes discovered during pinging (found [[]], but needed [2]), pinging again
[2018-11-27T22:53:55,009][WARN ][o.e.d.z.UnicastZenPing   ] 
[elasticsearch-client-647c67f49d-npjp4] failed to resolve host 
[elasticsearch-discovery] java.net.UnknownHostException: elasticsearch-discovery: Name or service not known

UPDATE

helm template ./charts/stable/elasticsearch --namespace logging --name elasticsearch --set data.terminationGracePeriodSeconds=0 > deployment.yaml

this is the template that helm install

---
# Source: elasticsearch/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: elasticsearch
  labels:
    app: elasticsearch
    chart: "elasticsearch-1.14.1"
    release: "elasticsearch"
    heritage: "Tiller"
data:
  elasticsearch.yml: |-
    cluster.name: elasticsearch

    node.data: ${NODE_DATA:true}
    node.master: ${NODE_MASTER:true}
    node.ingest: ${NODE_INGEST:true}
    node.name: ${HOSTNAME}

    network.host: 0.0.0.0
    # see https://github.com/kubernetes/kubernetes/issues/3595
    bootstrap.memory_lock: ${BOOTSTRAP_MEMORY_LOCK:false}

    discovery:
      zen:
        ping.unicast.hosts: ${DISCOVERY_SERVICE:}
        minimum_master_nodes: ${MINIMUM_MASTER_NODES:2}

    # see https://github.com/elastic/elasticsearch-definitive-guide/pull/679
    processors: ${PROCESSORS:}

    # avoid split-brain w/ a minimum consensus of two masters plus a data node
    gateway.expected_master_nodes: ${EXPECTED_MASTER_NODES:2}
    gateway.expected_data_nodes: ${EXPECTED_DATA_NODES:1}
    gateway.recover_after_time: ${RECOVER_AFTER_TIME:5m}
    gateway.recover_after_master_nodes: ${RECOVER_AFTER_MASTER_NODES:2}
    gateway.recover_after_data_nodes: ${RECOVER_AFTER_DATA_NODES:1}
  log4j2.properties: |-
    status = error
    appender.console.type = Console
    appender.console.name = console
    appender.console.layout.type = PatternLayout
    appender.console.layout.pattern = [%d{ISO8601}][%-5p][%-25c{1.}] %marker%m%n
    rootLogger.level = info
    rootLogger.appenderRef.console.ref = console
    logger.searchguard.name = com.floragunn
    logger.searchguard.level = info
  pre-stop-hook.sh: |-
    #!/bin/bash
    exec &> >(tee -a "/var/log/elasticsearch-hooks.log")
    NODE_NAME=${HOSTNAME}
    echo "Prepare to migrate data of the node ${NODE_NAME}"
    echo "Move all data from node ${NODE_NAME}"
    curl -s -XPUT -H 'Content-Type: application/json' 'elasticsearch-client:9200/_cluster/settings' -d "{
      \"transient\" :{
          \"cluster.routing.allocation.exclude._name\" : \"${NODE_NAME}\"
      }
    }"
    echo ""

    while true ; do
      echo -e "Wait for node ${NODE_NAME} to become empty"
      SHARDS_ALLOCATION=$(curl -s -XGET 'http://elasticsearch-client:9200/_cat/shards')
      if ! echo "${SHARDS_ALLOCATION}" | grep -E "${NODE_NAME}"; then
        break
      fi
      sleep 1
    done
    echo "Node ${NODE_NAME} is ready to shutdown"
  post-start-hook.sh: |-
    #!/bin/bash
    exec &> >(tee -a "/var/log/elasticsearch-hooks.log")
    NODE_NAME=${HOSTNAME}
    CLUSTER_SETTINGS=$(curl -s -XGET "http://elasticsearch-client:9200/_cluster/settings")
    if echo "${CLUSTER_SETTINGS}" | grep -E "${NODE_NAME}"; then
      echo "Activate node ${NODE_NAME}"
      curl -s -XPUT -H 'Content-Type: application/json' "http://elasticsearch-client:9200/_cluster/settings" -d "{
        \"transient\" :{
            \"cluster.routing.allocation.exclude._name\" : null
        }
      }"
    fi
    echo "Node ${NODE_NAME} is ready to be used"

---
# Source: elasticsearch/templates/client-serviceaccount.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: elasticsearch
    chart: elasticsearch-1.14.1
    component: "client"
    heritage: Tiller
    release: elasticsearch
  name: elasticsearch-client

---
# Source: elasticsearch/templates/data-serviceaccount.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: elasticsearch
    chart: elasticsearch-1.14.1
    component: "data"
    heritage: Tiller
    release: elasticsearch
  name: elasticsearch-data

---
# Source: elasticsearch/templates/master-serviceaccount.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    app: elasticsearch
    chart: elasticsearch-1.14.1
    component: "master"
    heritage: Tiller
    release: elasticsearch
  name: elasticsearch-master

---
# Source: elasticsearch/templates/client-svc.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: elasticsearch
    chart: elasticsearch-1.14.1
    component: "client"
    heritage: Tiller
    release: elasticsearch
  name: elasticsearch-client

spec:
  ports:
    - name: http
      port: 9200
      targetPort: http
  selector:
    app: elasticsearch
    component: "client"
    release: elasticsearch
  type: ClusterIP

---
# Source: elasticsearch/templates/master-svc.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app: elasticsearch
    chart: elasticsearch-1.14.1
    component: "master"
    heritage: Tiller
    release: elasticsearch
  name: elasticsearch-discovery
spec:
  clusterIP: None
  ports:
    - port: 9300
      targetPort: transport
  selector:
    app: elasticsearch
    component: "master"
    release: elasticsearch

---
# Source: elasticsearch/templates/client-deployment.yaml
apiVersion: apps/v1beta1
kind: Deployment
metadata:
  labels:
    app: elasticsearch
    chart: elasticsearch-1.14.1
    component: "client"
    heritage: Tiller
    release: elasticsearch
  name: elasticsearch-client
spec:
  replicas: 2
  template:
    metadata:
      labels:
        app: elasticsearch
        component: "client"
        release: elasticsearch
    spec:
      serviceAccountName: elasticsearch-client
      securityContext:
        fsGroup: 1000
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  app: "elasticsearch"
                  release: "elasticsearch"
                  component: "client"
      initContainers:
      # see https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html
      # and https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration-memory.html#mlockall
      - name: "sysctl"
        image: "busybox:latest"
        imagePullPolicy: "Always"
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
      containers:
      - name: elasticsearch
        env:
        - name: NODE_DATA
          value: "false"
        - name: NODE_MASTER
          value: "false"
        - name: DISCOVERY_SERVICE
          value: elasticsearch-discovery
        - name: PROCESSORS
          valueFrom:
            resourceFieldRef:
              resource: limits.cpu
        - name: ES_JAVA_OPTS
          value: "-Djava.net.preferIPv4Stack=true -Xms512m -Xmx512m "
        - name: MINIMUM_MASTER_NODES
          value: "2"
        resources:
            limits:
              cpu: "1"
            requests:
              cpu: 25m
              memory: 512Mi

        readinessProbe:
          httpGet:
            path: /_cluster/health
            port: 9200
          initialDelaySeconds: 5
        livenessProbe:
          httpGet:
            path: /_cluster/health?local=true
            port: 9200
          initialDelaySeconds: 90
        image: "docker.elastic.co/elasticsearch/elasticsearch-oss:6.5.0"
        imagePullPolicy: "IfNotPresent"
        ports:
        - containerPort: 9200
          name: http
        - containerPort: 9300
          name: transport
        volumeMounts:
        - mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
          name: config
          subPath: elasticsearch.yml
      volumes:
      - name: config
        configMap:
          name: elasticsearch

---
# Source: elasticsearch/templates/data-statefulset.yaml
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  labels:
    app: elasticsearch
    chart: elasticsearch-1.14.1
    component: "data"
    heritage: Tiller
    release: elasticsearch
  name: elasticsearch-data
spec:
  serviceName: elasticsearch-data
  replicas: 2
  template:
    metadata:
      labels:
        app: elasticsearch
        component: "data"
        release: elasticsearch
    spec:
      serviceAccountName: elasticsearch-data
      securityContext:
        fsGroup: 1000
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  app: "elasticsearch"
                  release: "elasticsearch"
                  component: "data"
      initContainers:
      # see https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html
      # and https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration-memory.html#mlockall
      - name: "sysctl"
        image: "busybox:latest"
        imagePullPolicy: "Always"
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
      - name: "chown"
        image: "docker.elastic.co/elasticsearch/elasticsearch-oss:6.5.0"
        imagePullPolicy: "IfNotPresent"
        command:
        - /bin/bash
        - -c
        - chown -R elasticsearch:elasticsearch /usr/share/elasticsearch/data &&
          chown -R elasticsearch:elasticsearch /usr/share/elasticsearch/logs
        securityContext:
          runAsUser: 0
        volumeMounts:
        - mountPath: /usr/share/elasticsearch/data
          name: data
      containers:
      - name: elasticsearch
        env:
        - name: DISCOVERY_SERVICE
          value: elasticsearch-discovery
        - name: NODE_MASTER
          value: "false"
        - name: PROCESSORS
          valueFrom:
            resourceFieldRef:
              resource: limits.cpu
        - name: ES_JAVA_OPTS
          value: "-Djava.net.preferIPv4Stack=true -Xms1536m -Xmx1536m "
        - name: MINIMUM_MASTER_NODES
          value: "2"
        image: "docker.elastic.co/elasticsearch/elasticsearch-oss:6.5.0"
        imagePullPolicy: "IfNotPresent"
        ports:
        - containerPort: 9300
          name: transport

        resources:
            limits:
              cpu: "1"
            requests:
              cpu: 25m
              memory: 1536Mi

        readinessProbe:
          httpGet:
            path: /_cluster/health?local=true
            port: 9200
          initialDelaySeconds: 5
        volumeMounts:
        - mountPath: /usr/share/elasticsearch/data
          name: data
        - mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
          name: config
          subPath: elasticsearch.yml
        - name: config
          mountPath: /pre-stop-hook.sh
          subPath: pre-stop-hook.sh
        - name: config
          mountPath: /post-start-hook.sh
          subPath: post-start-hook.sh
        lifecycle:
          preStop:
            exec:
              command: ["/bin/bash","/pre-stop-hook.sh"]
          postStart:
            exec:
              command: ["/bin/bash","/post-start-hook.sh"]
      terminationGracePeriodSeconds: 0
      volumes:
      - name: config
        configMap:
          name: elasticsearch
  updateStrategy:
    type: OnDelete
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes:
        - "ReadWriteOnce"
      resources:
        requests:
          storage: "30Gi"

---
# Source: elasticsearch/templates/master-statefulset.yaml
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  labels:
    app: elasticsearch
    chart: elasticsearch-1.14.1
    component: "master"
    heritage: Tiller
    release: elasticsearch
  name: elasticsearch-master
spec:
  serviceName: elasticsearch-master
  replicas: 3
  template:
    metadata:
      labels:
        app: elasticsearch
        component: "master"
        release: elasticsearch
    spec:
      serviceAccountName: elasticsearch-master
      securityContext:
        fsGroup: 1000
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              topologyKey: kubernetes.io/hostname
              labelSelector:
                matchLabels:
                  app: "elasticsearch"
                  release: "elasticsearch"
                  component: "master"
      initContainers:
      # see https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html
      # and https://www.elastic.co/guide/en/elasticsearch/reference/current/setup-configuration-memory.html#mlockall
      - name: "sysctl"
        image: "busybox:latest"
        imagePullPolicy: "Always"
        command: ["sysctl", "-w", "vm.max_map_count=262144"]
        securityContext:
          privileged: true
      - name: "chown"
        image: "docker.elastic.co/elasticsearch/elasticsearch-oss:6.5.0"
        imagePullPolicy: "IfNotPresent"
        command:
        - /bin/bash
        - -c
        - chown -R elasticsearch:elasticsearch /usr/share/elasticsearch/data &&
          chown -R elasticsearch:elasticsearch /usr/share/elasticsearch/logs
        securityContext:
          runAsUser: 0
        volumeMounts:
        - mountPath: /usr/share/elasticsearch/data
          name: data
      containers:
      - name: elasticsearch
        env:
        - name: NODE_DATA
          value: "false"
        - name: DISCOVERY_SERVICE
          value: elasticsearch-discovery
        - name: PROCESSORS
          valueFrom:
            resourceFieldRef:
              resource: limits.cpu
        - name: ES_JAVA_OPTS
          value: "-Djava.net.preferIPv4Stack=true -Xms512m -Xmx512m "
        - name: MINIMUM_MASTER_NODES
          value: "2"
        resources:
            limits:
              cpu: "1"
            requests:
              cpu: 25m
              memory: 512Mi

        readinessProbe:
          httpGet:
            path: /_cluster/health?local=true
            port: 9200
          initialDelaySeconds: 5
        image: "docker.elastic.co/elasticsearch/elasticsearch-oss:6.5.0"
        imagePullPolicy: "IfNotPresent"
        ports:
        - containerPort: 9300
          name: transport

        volumeMounts:
        - mountPath: /usr/share/elasticsearch/data
          name: data
        - mountPath: /usr/share/elasticsearch/config/elasticsearch.yml
          name: config
          subPath: elasticsearch.yml
      volumes:
      - name: config
        configMap:
          name: elasticsearch
  updateStrategy:
    type: OnDelete
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes:
        - "ReadWriteOnce"
      resources:
        requests:
          storage: "4Gi"


---
# Source: elasticsearch/templates/client-pdb.yaml


---
# Source: elasticsearch/templates/data-pdb.yaml


---
# Source: elasticsearch/templates/master-pdb.yaml


---
# Source: elasticsearch/templates/podsecuritypolicy.yaml


---
# Source: elasticsearch/templates/role.yaml


---
# Source: elasticsearch/templates/rolebinding.yaml

How have you configured each node, i.e. what is content of elasticsearch.yml file on each node? — Nishant
What discovery type are u using? use the discovery type supported by aws env — Ijaz Ahmad
check output of kubectl describe elasticsearch-master-0. it might give you any clue why the pods are in pending state. — Emruz Hossain
@IjazAhmadKhan I'm not sure im just learning all this stuff but I think that it is zen. ; @EmruzHossain I have one events, pod has unbound PersistentVolumeClaims ; @NishantSaini I update the post with elasticsearch.yaml — framled
I'd remove the YAML from here as it's well described and reproducible with the Helm chart. What would be helpful is if you post the full output of kubectl describe for both master-0 and data-0. — Clorichel

framled framled · Accepted Answer · 2018-12-03T14:13:47

I fixed adding a PVC and a StorageClass, because the events logs from master-0 and data-0 was PVC is not bound

elasticsearch failed to resolve host [elasticsearch-discovery]

1 Answers