0
votes

I have a topic that is compacted:

/opt/kafka/bin/kafka-topics.sh --zookeeper localhost --describe --topic myTopic
Topic:myTopic   PartitionCount:1    ReplicationFactor:1 Configs:cleanup.policy=compact

There are no messages on it:

/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic myTopic --from-beginning --property print-key=true
^CProcessed a total of 0 messages

Both the earliest and latest offset on the only partition that's there is 12, though.

/opt/kafka/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic myTopic --time -2
myTopic:0:12

/opt/kafka/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic myTopic --time -1
myTopic:0:12

I wonder what could have happened with these 12 messages? The number is correct, I was expecting them to be there, but for some reason they're gone.

As far as I understand, even if these 12 messages had the same key, I should have seen at least one - that's how the compaction works.

The topic in question was created as compacted. The only weird thing that might have happened during that time is that the Kafka instance lost its Zookeeper data completely. Is it possible that it also caused the data loss?

To rephrase the last question: can something bad happen with the physical data on Kafka if I remove all the Kafka-related ZNodes on Zookeeper?

In addition, here are some logs from Kafka startup.

[2019-04-30 12:02:16,510] WARN [Log partition=myTopic-0, dir=/var/lib/kafka] Found a corrupted index file corresponding to log file /var/lib/kafka/myTopic-0/00000000000000000000.log due to Corrupt index found, index file (/var/lib/kafka/myTopic-0/00000000000000000000.index) has non-zero size but the last offset is 0 which is no greater than the base offset 0.}, recovering segment and rebuilding index files... (kafka.log.Log)

[2019-04-30 12:02:16,524] INFO [Log partition=myTopic-0, dir=/var/lib/kafka] Completed load of log with 1 segments, log start offset 0 and log end offset 12 in 16 ms (kafka.log.Log)

[2019-04-30 12:35:34,530] INFO Got user-level KeeperException when processing sessionid:0x16a6e1ea2000001 type:setData cxid:0x1406 zxid:0xd11 txntype:-1 reqpath:n/a Error Path:/config/topics/myTopic Error:KeeperErrorCode = NoNode for /config/topics/myTopic (org.apache.zookeeper.server.PrepRequestProcessor)

[2019-04-30 12:35:34,535] INFO Topic creation Map(myTopic-0 -> ArrayBuffer(0)) (kafka.zk.AdminZkClient)

[2019-04-30 12:35:34,547] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions myTopic-0 (kafka.server.ReplicaFetcherManager)

[2019-04-30 12:35:34,580] INFO [Partition myTopic-0 broker=0] No checkpointed highwatermark is found for partition myTopic-0 (kafka.cluster.Partition)

[2019-04-30 12:35:34,580] INFO Replica loaded for partition myTopic-0 with initial high watermark 0 (kafka.cluster.Replica)

[2019-04-30 12:35:34,580] INFO [Partition myTopic-0 broker=0] myTopic-0 starts at Leader Epoch 0 from offset 12. Previous Leader Epoch was: -1 (kafka.cluster.Partition)

And the messages were indeed removed:

[2019-04-30 12:39:24,199] INFO [Log partition=myTopic-0, dir=/var/lib/kafka] Found deletable segments with base offsets [0] due to retention time 10800000ms breach (kafka.log.Log)

[2019-04-30 12:39:24,201] INFO [Log partition=myTopic-0, dir=/var/lib/kafka] Rolled new log segment at offset 12 in 2 ms. (kafka.log.Log)

1
I find it a bit strange where this 10800000ms retention is coming from (3 hrs)senseiwu
@sense server.properties most likelyOneCricketeer

1 Answers

0
votes

NoNode for /config/topics/myTopic

Kafka no longer knows this topic exists and that it should be compacted, which seems to be evident by the log cleaner logs

due to retention time 10800000ms breach

So yes, Zookeeper is very important. But so is gracefully shutting down a broker with kafka-server-stop, otherwise forcibly killing the process or the host machine would end up with corrupted partition segments


I'm not entirely sure what conditions would lead to this

the last offset is 0 which is no greater than the base offset 0

But assuming that you had a full cluster and that the topic had a replication factor higher than 1, then you could hope that at least one replica were healthy.

The way to recover a broker with a corrupted index/partition would be stop kafka process, delete the corrupted partition folder from disk, restart kafka on that machine, and then let it replicate back from a healthy instance