13
votes
  1. I have Kafka deployed and running in Kubernetes cluster. I am using this image from docker hub - https://hub.docker.com/r/cloudtrackinc/kubernetes-kafka/
  2. I have 3 kube-nodes in my kubernetes cluster. I have 3 Kafka and 3 zookeeper applications running and I have services zoo1,zoo2,zoo3 and kafka-1, kafka-2 and kafka-3 running corresponding to them. I am able to publish/consume from inside kubernetes cluster but I am not able to publish/consume from outside of kubernetes cluster i.e., from external machine not part of kubernetes cluster.
  3. I am able to reach the kube-nodes from external machine - basically I can ping them using name/ip.
  4. I am not using any external load balancer but I have a DNS that can resolve both my external machine and kube-nodes.
  5. Using NodePort or ExternalIP to expose the Kafka service does not work in this case.
  6. Setting KAFKA_ADVERTISED_HOST_NAME or KAFKA_ADVERTISED_LISTENERS in Kafka RC YML that ultimately set ADVERTISED_HOST_NAME/ADVERTISED_LISTENERS properties in server.properties either does not help accessing kafka from outside of kubernetes cluster.

Please suggest how can I publish/consume from outside of kubernetes cluster. Thanks much!

4

4 Answers

16
votes

I had the same problem with accessing kafka from outside of k8s cluster on AWS. I manage to solve this issue by using kafka listeners feature which from version 0.10.2 supports multiple interfaces.

here is how I configured kafka container.

    ports:
    - containerPort: 9092
    - containerPort: 9093
    env:
    - name: KAFKA_ZOOKEEPER_CONNECT
      value: "zookeeper:2181"
    - name: KAFKA_LISTENER_SECURITY_PROTOCOL_MAP
      value: "INTERNAL_PLAINTEXT:PLAINTEXT,EXTERNAL_PLAINTEXT:PLAINTEXT"
    - name: KAFKA_ADVERTISED_LISTENERS
      value: "INTERNAL_PLAINTEXT://kafka-internal-service:9092,EXTERNAL_PLAINTEXT://123.us-east-2.elb.amazonaws.com:9093"
    - name: KAFKA_LISTENERS
      value: "INTERNAL_PLAINTEXT://0.0.0.0:9092,EXTERNAL_PLAINTEXT://0.0.0.0:9093"
    - name: KAFKA_INTER_BROKER_LISTENER_NAME
      value: "INTERNAL_PLAINTEXT"

Apart from that I configured two Services. One for internal(Headless) & one for external(LoadBalancer) communication.

Hopefully this will save people's time.

7
votes

I was able to solve my problem by doing the following changes -

  1. Using NodeSelector in YML to make kafka pod run on a particular node of kube cluster.

  2. Set KAFKA_ADVERTISED_HOST_NAME to Kube hostName where this Kafka POD has been configured to run on ( as configured in step 1 )

  3. Expose Kafka Service using NodePort and set POD port same as that of exposed NodePort as shown below -

    spec:
      ports:
        - name: broker-2
          port: **30031**
          targetPort: 9092
          nodePort: **30031**
          protocol: TCP
      selector:
        app: kafka-2
        broker_id: "2"
      type: NodePort
    

Now, you can access Kafka brokers from outside of kube cluster using host:exposedPort

4
votes

I solved this problem by using Confluent's Kafka REST proxy image.

https://hub.docker.com/r/confluentinc/cp-kafka-rest/

Documentation of the REST Proxy is here:

http://docs.confluent.io/3.1.2/kafka-rest/docs/index.html

Step A: Build a Kafka broker docker image using latest Kafka version

I used a custom built Kafka broker image based on the same image you used. You basically just need to update cloudtrackinc's image to use Kafka version 0.10.1.0 or otherwise it won't work. Just update the Dockerfile from cloudertrackinc's image to use the latest wurstmeister kafka image and rebuild the docker image.

- FROM wurstmeister/kafka:0.10.1.0

I set the ADVERTISED_HOST_NAME for each Kafka broker to POD's IP so each broker gets an unique URL.

- name: ADVERTISED_HOST_NAME
  valueFrom:
    fieldRef:
      fieldPath: status.podIP

Step B: Setup cp-kafka-rest proxy to use your Kafka broker cluster

Kafka Rest Proxy must be running within the same cluster as your Kafka broker cluster.

You need to provide two environment variables to the cp-kafka-rest image at the minimum for it to run. KAFKA_REST_HOST_NAME and KAFKA_REST_ZOOKEEPER_CONNECT. You can set KAFKA_REST_HOST_NAME to use POD's IP.

- name: KAFKA_REST_HOST_NAME
  valueFrom:
    fieldRef:
      fieldPath: status.podIP
- name: KAFKA_REST_ZOOKEEPER_CONNECT
  value: "zookeeper-svc-1:2181,zookeeper-svc-2:2181,zookeeper-svc-3:2181"

Step C: Expose the Kafka REST proxy as a service

spec: type: NodePort or LoadBalancer ports: - name: kafka-rest-port port: 8082 protocol: TCP

You can use NodePort or LoadBalancer to utilize single or multiple Kafka REST Proxy pods.

Pros and Cons of using Kafka REST proxy

Pros:

  1. You can scale Kafka broker cluster easily
  2. You do not have to expose Kakfa brokers outside of the cluster
  3. You can use the loadbalancer with the Proxy.
  4. You can use any type of client to access the Kafka cluster (i.e. curl). Very light weight.

Cons:

  1. Another component/layer on top of the Kakfa cluster.
  2. Consumers are created within the proxy pod. This will need to be keep track of by your REST client.
  3. Performance is not ideal: REST instead of native Kafka protocol. Although if you deploy multiple proxies this might help a little bit. I would not use this setup for high volume traffic. For low volume message traffic this might be okay.

So if you can live with the issues above, then give Kafka Rest Proxy a try.

1
votes

This seems not to be possible at the moment, the network architecture of kafka is pretty poor regarding to this topic. The new consumer uses a list of brokers, which return the host of the zookeeper, but unfortunately this is in a different network, so it is not possible to reach it from your local client. The poor part of kafka is, that is not possible to specify the brokers AND the zookeeper servers. This prevents clients accessing the system from outside.

We worked around this for the moment using a busybox, where we installed tools to interact with kafka. In our case plunger