This problem also impacts ingesting data from the kafka using flume and sink the data to HDFS.
To fix the above issue:
- Stop Kafka brokers
- Connect to zookeeper cluster and remove /brokers z node
- Restart kafka brokers
There is no issue with respect to kafka client version and scala version that we are using the cluster. Zookeeper might have wrong information about broker hosts.
To verify the action:
Create topic in kafka.
$ kafka-console-consumer --bootstrap-server slavenode01.cdh.com:9092 --topic rkkrishnaa3210 --from-beginning
Open a producer channel and feed some messages to it.
$ kafka-console-producer --broker-list slavenode03.cdh.com:9092 --topic rkkrishnaa3210
Open a consumer channel to consume the message from a specific topic.
$ kafka-console-consumer --bootstrap-server slavenode01.cdh.com:9092 --topic rkkrishnaa3210 --from-beginning
To test this from flume:
Flume agent config:
rk.sources = source1
rk.channels = channel1
rk.sinks = sink1
rk.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
rk.sources.source1.zookeeperConnect = ip-20-0-21-161.ec2.internal:2181
rk.sources.source1.topic = rkkrishnaa321
rk.sources.source1.groupId = flume1
rk.sources.source1.channels = channel1
rk.sources.source1.interceptors = i1
rk.sources.source1.interceptors.i1.type = timestamp
rk.sources.source1.kafka.consumer.timeout.ms = 100
rk.channels.channel1.type = memory
rk.channels.channel1.capacity = 10000
rk.channels.channel1.transactionCapacity = 1000
rk.sinks.sink1.type = hdfs
rk.sinks.sink1.hdfs.path = /user/ce_rk/kafka/%{topic}/%y-%m-%d
rk.sinks.sink1.hdfs.rollInterval = 5
rk.sinks.sink1.hdfs.rollSize = 0
rk.sinks.sink1.hdfs.rollCount = 0
rk.sinks.sink1.hdfs.fileType = DataStream
rk.sinks.sink1.channel = channel1
Run flume agent:
flume-ng agent --conf . -f flume.conf -Dflume.root.logger=DEBUG,console -n rk
Observe logs from the consumer that the message from the topic is written in HDFS.
18/02/16 05:21:14 INFO internals.AbstractCoordinator: Successfully joined group flume1 with generation 1
18/02/16 05:21:14 INFO internals.ConsumerCoordinator: Setting newly assigned partitions [rkkrishnaa3210-0] for group flume1
18/02/16 05:21:14 INFO kafka.SourceRebalanceListener: topic rkkrishnaa3210 - partition 0 assigned.
18/02/16 05:21:14 INFO kafka.KafkaSource: Kafka source source1 started.
18/02/16 05:21:14 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: source1: Successfully registered new MBean.
18/02/16 05:21:14 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: source1 started
18/02/16 05:21:41 INFO hdfs.HDFSDataStream: Serializer = TEXT, UseRawLocalFileSystem = false
18/02/16 05:21:42 INFO hdfs.BucketWriter: Creating /user/ce_rk/kafka/rkkrishnaa3210/18-02-16/FlumeData.1518758501920.tmp
18/02/16 05:21:48 INFO hdfs.BucketWriter: Closing /user/ce_rk/kafka/rkkrishnaa3210/18-02-16/FlumeData.1518758501920.tmp
18/02/16 05:21:48 INFO hdfs.BucketWriter: Renaming /user/ce_rk/kafka/rkkrishnaa3210/18-02-16/FlumeData.1518758501920.tmp to /user/ce_rk/kafka/rkkrishnaa3210/18-02-16/FlumeData.1518758501920
18/02/16 05:21:48 INFO hdfs.HDFSEventSink: Writer callback called.