6
votes

I have a Kafka cluster with 2 brokers, each on their own (AWS) server (I set up the cluster using the intstructions here). I am using SASL (but no encryption). On server 2, which runs broker 2, I created a topic:

KAFKA_OPTS="-Djava.security.auth.login.config=/home/kafka/kafka_2.11-1.0.0/config/jaas.conf -Djava.security.krb5.conf=/etc/krb5.conf" \
    bin/kafka-topics.sh --create \
    --zookeeper zookeeper-server-01.eigenroute.com:2181,zookeeper-server-02.eigenroute.com:2181,zookeeper-server-03.eigenroute.com:2181/apps/kafka-cluster-demo \
    --replication-factor 2   --partitions 9   --topic another-test-topic

seemingly with success, because describing the topics shows that it was at least created:

KAFKA_OPTS="-Djava.security.auth.login.config=/home/kafka/kafka_2.11-1.0.0/config/jaas.conf -Djava.security.krb5.conf=/etc/krb5.conf" \
    bin/kafka-topics.sh --describe \
    --zookeeper zookeeper-server-01.eigenroute.com:2181,zookeeper-server-02.eigenroute.com:2181,zookeeper-server-03.eigenroute.com:2181/apps/kafka-cluster-demo
Topic:another-test-topic    PartitionCount:9    ReplicationFactor:2 Configs:    MarkedForDeletion:true
    Topic: another-test-topic   Partition: 0    Leader: none    Replicas: 2,1   Isr:
    Topic: another-test-topic   Partition: 1    Leader: none    Replicas: 1,2   Isr:
    Topic: another-test-topic   Partition: 2    Leader: none    Replicas: 2,1   Isr:
    Topic: another-test-topic   Partition: 3    Leader: none    Replicas: 1,2   Isr:
    Topic: another-test-topic   Partition: 4    Leader: none    Replicas: 2,1   Isr:
    Topic: another-test-topic   Partition: 5    Leader: none    Replicas: 1,2   Isr:
    Topic: another-test-topic   Partition: 6    Leader: none    Replicas: 2,1   Isr:
    Topic: another-test-topic   Partition: 7    Leader: none    Replicas: 1,2   Isr:
    Topic: another-test-topic   Partition: 8    Leader: none    Replicas: 2,1   Isr:

As you can see, this topic is assigned no leader, and has no in-sync replicas. I have assigned write permissions to a producer:

KAFKA_HEAP_OPTS="-Djava.security.auth.login.config=/home/kafka/kafka_2.11-1.0.0/config/jaas.conf -Dsun.security.krb5.debug=true -Djava.security.krb5.conf=/etc/krb5.conf -Xmx256M -Xms128M" \
    bin/kafka-acls.sh --authorizer-properties \
    zookeeper.connect=zookeeper-server-01.eigenroute.com:2181,zookeeper-server-02.eigenroute.com:2181,zookeeper-server-03.eigenroute.com:2181/apps/kafka-cluster-demo \
   --add --allow-principal User:producer1 --producer --topic another-test-topic
...
Current ACLs for resource `Topic:another-test-topic`:
    User:producer1 has Allow permission for operations: Describe from hosts: *
    User:producer1 has Allow permission for operations: Write from hosts: *

My producer is, however, unable to write to this topic:

KAFA_HEAP_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Dsun.security.krb5.debug=true" \
    bin/kafka-console-producer.sh \
    --broker-list server-01.eigenroute.com:9092,server-02.eigenroute.com:9092 \
    --topic another-test-topic --producer.config config/sasl-producer.properties
>this is a test message
[2018-01-07 21:16:02,650] WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 1 : {another-test-topic=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient)

The ACL on the ZooKeeper node for this topic is:

[zk: zookeeper-server-03.eigenroute.com:2181(CONNECTED) 8] getAcl /apps/kafka-cluster-demo/brokers/topics/another-test-topic
'world,'anyone
: r
'sasl,'kafka/[email protected]
: cdrwa

Which I find strange... shouldn't kafka/[email protected] (the Kerberos principal for broker 1) have the same permissions as kafka/[email protected] (the Kerberos principal for broker 2)?

Can someone suggest why the producer cannot see the topic to which it is authorized to write?

UPDATE: Below are responses to the questions in the answer provided by @Vladimir Nabokov:

  1. I don't see the partition topic on either broker's node:

    kafka@server-02:/var/log/kafka$ ls -alhtr total 124K -rw-r--r-- 1 kafka kafka 0 Jan 7 23:27 .lock -rw-r--r-- 1 kafka kafka 0 Jan 7 23:27 cleaner-offset-checkpoint -rw-r--r-- 1 kafka kafka 54 Jan 7 23:27 meta.properties drwxr-xr-x 7 root root 4.0K Jan 9 06:25 .. drwxr-xr-x 2 kafka kafka 4.0K Jan 18 05:30 __consumer_offsets-29 ... drwxr-xr-x 2 kafka kafka 4.0K Jan 18 05:30 __consumer_offsets-1 -rw-r--r-- 1 kafka kafka 600 Jan 18 05:56 replication-offset-checkpoint -rw-r--r-- 1 kafka kafka 600 Jan 18 05:56 recovery-point-offset-checkpoint -rw-r--r-- 1 kafka kafka 4 Jan 18 05:56 log-start-offset-checkpoint drwxr-xr-x 27 kafka kafka 4.0K Jan 18 05:56 .

and

kafka@server-01:/var/log/kafka$ ls -alhtr
total 124K
-rw-r--r--  1 kafka kafka    0 Jan  7 23:26 .lock
-rw-r--r--  1 kafka kafka    0 Jan  7 23:26 cleaner-offset-checkpoint
-rw-r--r--  1 kafka kafka   54 Jan  7 23:26 meta.properties
drwxr-xr-x  7 root  root  4.0K Jan 17 06:25 ..
drwxr-xr-x  2 kafka kafka 4.0K Jan 18 05:30 __consumer_offsets-0
...
drwxr-xr-x  2 kafka kafka 4.0K Jan 18 05:30 __consumer_offsets-32
-rw-r--r--  1 kafka kafka  600 Jan 18 05:58 recovery-point-offset-checkpoint
-rw-r--r--  1 kafka kafka    4 Jan 18 05:58 log-start-offset-checkpoint
-rw-r--r--  1 kafka kafka  600 Jan 18 05:59 replication-offset-checkpoint
drwxr-xr-x 27 kafka kafka 4.0K Jan 18 05:59 .
  1. the user kafka, which is the user that runs the kafka server, is the owner of the /var/log/kafka/ directory on both brokers:

    kafka@server-02:~/kafka_2.11-1.0.0/confkafka@server-01:/var/log$ ll /var/log | grep kafka drwxr-xr-x 27 kafka kafka 4096 Jan 18 05:49 kafka ig$ ll /var/log | grep kafka

  2. Looks like telnet is working, to both brokers:

    sjamal-> telnet server-01.eigenroute.com 9092 Trying 54.175.56.39... Connected to server-01.eigenroute.com. Escape character is '^]'. ^CConnection closed by foreign host. [~/projects/microservices/kafka-tutorial/kafka_2.11-1.0.0] sjamal-> telnet server-02.eigenroute.com 9092 Trying 18.221.32.34... Connected to server-02.eigenroute.com. Escape character is '^]'. ^CConnection closed by foreign host.

  3. Not necessary - they can see each other:

    kafka@server-02:~/kafka_2.11-1.0.0/config$ host server-01.eigenroute.com server-01.eigenroute.com has address 54.175.56.39 kafka@server-01:/var/log$ host server-02.eigenroute.com server-02.eigenroute.com has address 18.221.32.34

  4. I tried this. The consumer does not receive any messages:

    [2018-01-18 00:45:31,931] WARN [Consumer clientId=consumer-1, groupId=console-consumer-95024] Error while fetching metadata with correlation id 7022 : {another-test-topic=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient) [2018-01-18 00:45:32,063] WARN [Consumer clientId=consumer-1, groupId=console-consumer-95024] Error while fetching metadata with correlation id 7024 : {another-test-topic=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient) [2018-01-18 00:45:32,194] WARN [Consumer clientId=consumer-1, groupId=console-consumer-95024] Error while fetching metadata with correlation id 7025 : {another-test-topic=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient) [2018-01-18 00:45:32,327] WARN [Consumer clientId=consumer-1, groupId=console-consumer-95024] Error while fetching metadata with correlation id 7026 : {another-test-topic=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient)

drwxr-xr-x 27 kafka kafka 4096 Jan 18 05:44 kafka

Sorry about the bad formatting - I have pasted code and hit the 'code' button, but it is not formatting as code, I don't know why.

UPDATE #2: In response to Vladimir Nabokov's comment in his answer, I am pasting the producer and consumer configs and commands that I am using:

// sasl-producer.properties
bootstrap.servers=server-01.eigenroute.com:9092
compression.type=none
security.protocol=SASL_PLAINTEXT
sasl.mechanism=GSSAPI
sasl.kerberos.service.name=kafka
sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \
        useKeyTab=true \
        storeKey=true  \
        keyTab="/path/to/producer1.whatever.keytab" \
        principal="producer1/[email protected]";

// sasl-consumer.properties
bootstrap.servers=server-01.eigenroute.com:9092
security.protocol=SASL_PLAINTEXT
sasl.mechanism=GSSAPI
sasl.kerberos.service.name=kafka
sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required \
        useKeyTab=true \
        storeKey=true  \
        keyTab="/path/to/consumer1.whatever.keytab" \
        principal="consumer1/[email protected]";

# producer command
KAFA_HEAP_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Dsun.security.krb5.debug=true"     bin/kafka-console-producer.sh     --broker-list server-01.eigenroute.com:9092,server-02.eigenroute.com:9092     --topic another-test-topic --producer.config config/sasl-producer.properties

# consumer command
KAFA_HEAP_OPTS="-Djava.security.krb5.conf=/etc/krb5.conf -Dsun.security.krb5.debug=true"     bin/kafka-console-consumer.sh     --bootstrap-server server-01.eigenroute.com:9092,server-02.eigenroute.com:9092     --topic another-test-topic --consumer.config config/sasl-consumer.properties --from-beginning
2

2 Answers

1
votes

This error:

WARN [Producer clientId=console-producer] Error while fetching metadata with correlation id 1 : {another-test-topic=UNKNOWN_TOPIC_OR_PARTITION} (org.apache.kafka.clients.NetworkClient)

does not obligatory mean you failed to write to the topic.

It only means, this topic is "a new" one.

1) check, may be data has been written in data dir, you can see that visually, cd data_dir/topic/partition, look for files that grow. (look on both servers in all partitions)

2)check, may be data_dir has no permissions to write for your kafka user?

3) check from produce machine 'telnet kafka_host kafka_port', may be producer does not see your kafka server in network

4) on both producer and kafka brokers, enter /etc/hosts and map IP to hostname for all 3 machines. They need not only know each other IPs, but be aware of host name to IP mapping (DNS service is an alternative)

5) Connect with consumer and try to consume your topic.

0
votes

Well, I don't know whether this qualifies as an answer, but it works. The solution was to create the new topic ("a-test-topic2") on the Broker 1 server (the original broker), and not the Broker 2 server (the broker added second).

Now if I take Broker 1 off line, then I can create a topic ("a-test-topic3") on the Broker 2 server - but then of course the replica number must be one, and so I will not be able to have any replicas (replicae?) on Broker 1.

Next, I bring Broker 1 back up, and try to create yet another topic ("a-test-topic4") on the Broker 2 server with a replica count of 2, and ...it works! For all partition on this new topic, Isr is 2,1 or 1,2 and leader is 1 or 2.

Hmm... so lets try to create a topic ("a-test-topic5") on the Broker 1 server. What happens? I get the same problem I originally had. The new topic partitions have no leaders and no Isr s. But I found a way to fix this - with Broker 1 running, I stopped Broker 2, then started Broker 2 again, and voila - this topic eventually gets, for all partitions, a Leader of 1 or 2 and Isrs of 1,2 or 2,1.

So I guess new topics can be created only on the first broker that was activated, or else all other brokers must be restarted?