Kafka connect read value based on a key and write to HDFS

Question

Is there any way to sink only specific event type from a kafka topic to HDFS filtering the remaining types using kafka connect HDFS connector?
can we segregate the input events based on some key and write to different partitions. So that the values of specific key goes to specific partition?
can we use the keys stored in schema registry to get the values in the topic specific to particular key for avro format data? Kindly let me know if my understanding needs clarity.

If Kafka connect does not have this features can this be implemented by using kafka streams?And please help with some documentation if it available.

OneCricketeer OneCricketeer · Accepted Answer · 2018-07-11T03:55:23

Is there any way to sink only specific event type from a kafka topic to HDFS filtering the remaining types using kafka connect HDFS connector?

Kafka Connect has transforms for manipulating messages, but it is not meant for filtering. That's commonly done by Kafka Streams or KSQL

can we segregate the input events based on some key and write to different partitions,So that the values of specific key goes to specific partition?

The FieldPartitioner class mentioned in the Confluent documentation does this (warning: I believe it only does top-level fields, not nested JSON or Avro record fields)

can we use the keys stored in schema registry to get the values in the topic specific to particular key for avro format data?

I don't understand the question, but HDFS Connect, by default, ignores the Kafka message key when writing the data, so I'm going to say no.

Kafka data isn't indexed by key, it's partitioned by it, which means if you did use the DefaultPartioner rather than the FieldPartitioner, then all keys would land in a single filesystem path by Kafka partition. Only then would you be able to query, not by the key, though, but by the partition. for example using Spark or Hive. Again, that's the default behavior - you can use a Transform, as mentioned previously, to add the Kafka key into the data, which you can then query by it

Kafka connect read value based on a key and write to HDFS

1 Answers