0
votes

Hi I run into a strange issue with increasing Kafka's replica factor when following the steps in this document: https://kafka.apache.org/documentation/#basic_ops_increase_replication_factor

The symptom looks like replica factor increase doesn't work at all.

Please help

My Kafka setup is

Kafka version: kafka_2.12-2.1.0

Server: hostname server-0 (192.168.0.1)

  • Kafka Broker Id: 0
  • Kafka Port: 9092
  • Zookeeper Port: 2181

Server: hostname server-1 (192.168.0.2)

  • Kafka Broker Id: 1
  • Kafka Port: 9092
  • No Zookeeper on server-1

Topics

  • Number of Topics: 1
  • Topic Name: DATA
  • Number of Partitions: 1

The DATA topic is created with replica-factor 1 from server-0 only first

bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic DATA

result looks like

bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic DATA Topic:DATA PartitionCount:1 ReplicationFactor:1 Configs: Topic: DATA Partition: 0 Leader: 0 Replicas: 0 Isr: 0

after creating the topic, I produced some test message

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic DATA message 1 message 2

Then the replica factor of topic DATA is increased to 2 by running commands in server-0 only

below json file is used with Kafka-reassign-partitions.sh to increase the replica-factor

{ "version":1, "partitions":[ {"topic":"DATA","partition":0,"replicas":[0,1]} ] }

command line:

bin/kafka-reassign-partitions.sh --zookeeper localhost:2181 --reassignment-json-file topics-to-expand.json --execute

On the surface, the result looks good by describing the topics

bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic DATA Topic:DATA PartitionCount:1 ReplicationFactor:2 Configs: Topic: DATA Partition: 0 Leader: 0 Replicas: 0,1 Isr: 0,1

I produced some more test messages here

bin/kafka-console-producer.sh --broker-list localhost:9092 --topic DATA message 3 message 4

However the problem arises when I tried to test from server-1

Now I killed the kafka process from server-0 by

kill -9 [kafka-pid]

The problem happens when I run the console-consumer from server-1

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic DATA --from-beginning

There are no messages shows up and the console just blocks at blank screen.

I think according to the document, I should be able to see the messages because replica is/was in-sync? No?

Describing the topic shows

bin/kafka-topics.sh --zookeeper server-0:2181 --describe --topic DATA Topic:DATA PartitionCount:1 ReplicationFactor:2 Configs: Topic: DATA Partition: 0 Leader: 1 Replicas: 0,1 Isr: 1

Then I restarted the kafka process from server-0, the consumer console screen all of sudden shows all the messages in history

message 1 message 2 message 3 message 4

It looks like that the consumer from server-1 didn't consume any data from server-1 locally because topic data is not replicated to server-1. Instead, it still waits for server-0 to come back up to supply the data. Even server-1 is marked as leader.

Can anyone replicate my problem? I want to attach my properties but I don't know how to attach files in stackoverflow sorry about that...

1
Where is server-1 running? It looks like you're trying to run both on localhost:9092?William Hammond
server-0 and server-1 are two independent amazon ec instances, you can think of them as 192.168.0.1 and 192.168.0.2. The Kafka processes run on barebone servers, no docker containers involved.Bob
You really should not immediately kill -9 any process... Just kill pid to gracefully shut it down, and if it doesn't stop, then -9 itOneCricketeer

1 Answers

1
votes

Inspired by this post and figured out why.

Killing node with __consumer_offsets leads to no message consumption at consumers

The reason of my above symptom is because default offsets.topic.replication.factor=3 but I only have 2 brokers (nodes) in the cluster. When Kafka first created __consumer_offsets topic, it fails back to offsets.topic.replication.factor=1 silently (yaks).

Changing offsets.topic.replication.factor=2 in property file solves above problem. (yes tested!)