0
votes

I am getting an error while migrating data between Kafka brokers.

I am using kafka-reassignment tool to reassign partitions to a different broker without any throttling(because it didn't worked with the below command.). There were around 400 partitions of 50 topics.

Apache Kafka 1.1.0
Confluent Docker Image tag : 4.1.0

Command:

kafka-reassign-partitions --zookeeper IP:2181  --reassignment-json-file proposed.json --execute —throttle 100000000

After some time, I am able to see the below error continuously on the target broker.

[2019-09-21 11:24:07,625] INFO [ReplicaFetcher replicaId=4, leaderId=0, fetcherId=0] Error sending fetch request (sessionId=514675011, epoch=INITIAL) to node 0: java.io.IOException: Connection to 0 was disconnected before the response was read. (org.apache.kafka.clients.FetchSessionHandler)

[2019-09-21 11:24:07,626] WARN [ReplicaFetcher replicaId=4, leaderId=0, fetcherId=0] Error in response for fetch request (type=FetchRequest, replicaId=4, maxWait=500, minBytes=1, maxBytes=10485760, fetchData={TOPIC-4=(offset=4624271, logStartOffset=4624271, maxBytes=104
8576), TOPIC-2=(offset=1704819, logStartOffset=1704819, maxBytes=1048576), TOPIC-8=(offset=990485, logStartOffset=990485, maxBytes=1048576), TOPIC-1=(offset=1696764, logStartOffset=1696764, maxBytes=1048576), TOPIC-7=(offset=991507, logStartOffset=991507, maxBytes=10485
76), TOPIC-5=(offset=988660, logStartOffset=988660, maxBytes=1048576)}, isolationLevel=READ_UNCOMMITTED, toForget=, metadata=(sessionId=514675011, epoch=INITIAL)) (kafka.server.ReplicaFetcherThread)

java.io.IOException: Connection to 0 was disconnected before the response was read
    at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:97)
    at kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:96)
    at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:220)
    at kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:43)
    at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:146)
    at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:111)
    at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:82)

Zookeeper status:

 ls /admin/reassign_partitions
[]

I am using t2.medium type EC2 instances and gp2 type EBS volumes with 120GB size.

I am able to connect to the zookeeper from all brokers.

[zk: localhost:2181(CONNECTED) 3] ls /brokers/ids [0, 1, 2, 3]

I am using IP address for all brokers, so DNS mismatch is also not the case.

Also, I am not able to see any topic scheduled for reassignment in zookeeper.

[zk: localhost:2181(CONNECTED) 2] ls /admin/reassign_partitions
[]

Interestingly, I can see data is pilling up for the partitions which are not listed above. But the partitions listed in the error are not getting migrated as of now.

I am using confluent kafka docker image.

Kafka Broker Setting: https://gist.github.com/ethicalmohit/cd44f580356ca02250760a307d90b54d

2
There's nothing special about Confluent here (in fact, there's nothing to migrate, as you can add all other Confluent services over an existing Apache Kafka installation). Have you tried reducing the amount of topics you're moving? Or providing multiple Zookeepers so that the connection will try multiple times and not disconnected while io is high on the broker?OneCricketeer
I was not able to decrease the amount of topic because it says that reassignment in progress. I have also added all zookeeper connection string in Kafka configuration.m0hit
What error logs do you have on broker 0?Alexandre Dupriez
@AlexandreDupriez I was not able to see any error in broker 0,1 or 2.m0hit

2 Answers

0
votes

If you can give us some more details on your topology maybe we can understand better the problem.

Some thoughts: - Can you connect via zookeeper-cli at kafka-0:2181 ? kafka-0 resolves to the correct host ? - If reassignment is in progress either you have to manual stop this by deleting the appropriate key in zookeeper (warning, this may make some topic or partition broken) either you have to wait for this job to finish. Can you monitor the ongoing reassignment and give some info about that ?

0
votes

This has been solved by increasing the value of replica.socket.receive.buffer.bytes in all destination brokers.

After changing the above parameter and restarting broker. I was able to see the data in above-mentioned partitions.