0
votes

I’ve done some due diligence, even digging into Akka code, but can’t figure out why my nodes are being discovered via kubernetes-api, one node self-joins, promotes to leader, but the other fails to join the cluster due to the leader being unable to resolve the requester’s address. I’ll post the application config and logs separately.

This is a 2-node (for now) Akka Cluster using Akka Management and Akka Discovery using kubernetes-api to discovery the nodes in 2 pods. I’m using a custom label selector of “application=vcsvc,environment=multibox,akka-cluster=vcsvc”.

Can anyone help identify where I’ve misconfigured things?

Snippet from application.conf:

akka: {
  management: {
    cluster: {
      bootstrap: {
        # optionally prohibits creating a new cluster, forcing a member to wait until cluster is formed
        new-cluster-enabled: on

        contact-point-discovery: {
          # pick the discovery method you'd like to use:
          discovery-method: "kubernetes-api"

          # the exact number of nodes for the initial startup of the cluster
          required-contact-point-nr: 2
        }
      }

      health-check: {
        # Ready health check returns 200 when cluster membership is in the following states.
        # Intended to be used to indicate this node is ready for user traffic so Up/WeaklyUp
        # Valid values: "Joining", "WeaklyUp", "Up", "Leaving", "Exiting", "Down", "Removed"
        ready-states: ["Up", "WeaklyUp"]

        readiness-path: "health/ready"
        liveness-path: "health/alive"
      }
    }
  }

  discovery: {
    # Set the following in your application.conf if you want to use this discovery mechanism:
    method: kubernetes-api

    kubernetes-api: {
      class = akka.discovery.kubernetes.KubernetesApiServiceDiscovery

      # API server, cert and token information. Currently these are present on K8s versions: 1.6, 1.7, 1.8, and perhaps more
      api-ca-path = "/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
      api-token-path = "/var/run/secrets/kubernetes.io/serviceaccount/token"
      api-service-host-env-name = "KUBERNETES_SERVICE_HOST"
      api-service-port-env-name = "KUBERNETES_SERVICE_PORT"

      # Namespace to query for pods.
      #
      # Set this value to a specific string to override discovering the namespace using pod-namespace-path.
      pod-namespace = ${com.apptio.vcsvc.namespace}

      # Selector value to query pod API with.
      # `%s` will be replaced with the configured effective name, which defaults to the actor system name
      pod-label-selector: "application="${com.apptio.vcsvc.namespace}",environment="${com.apptio.vcsvc.environment}",akka-cluster="${com.apptio.vcsvc.cluster.name}
    }
  }
}

I’ll add more upon request. I’m questioning the method line and the api- lines as the behavior didn’t change when I added those.

1
I was able to post logs here: discuss.lightbend.com/t/…Jack Pines

1 Answers

0
votes

So, it turns out that the reason the second node was unable to respond to the join success event (we saw that happening in the logs) was due to a problem with the akka.remote configuration. This one was not obvious to us due to all of the documentation and sample code of Lightbend's/Akka's directing us to use akka.remote.artery. For reasons I've yet to identify, this works in a local Discovery situation but not in our K8 environment. Rather, there I needed to use akka.remote.netty.tcp. I'll put the full configuration for clarity below.

Failing:

akka: {
  remote: {
    artery: {
      enabled: on
      transport: tcp
      canonical.port: 2552
    }
  }
}

Working:

akka: {
  remote: {
    netty.tcp: {
      hostname:  ${HOSTNAME}
      port: 2552
    }
  }
}

Hope that helps someone else avoid a week and a half of lost productivity someday.