Is there any way to sink only specific event type from a kafka topic to HDFS filtering the remaining types using kafka connect HDFS connector?
Kafka Connect has transforms for manipulating messages, but it is not meant for filtering. That's commonly done by Kafka Streams or KSQL
can we segregate the input events based on some key and write to different partitions,So that the values of specific key goes to specific partition?
The FieldPartitioner
class mentioned in the Confluent documentation does this (warning: I believe it only does top-level fields, not nested JSON or Avro record fields)
can we use the keys stored in schema registry to get the values in the topic specific to particular key for avro format data?
I don't understand the question, but HDFS Connect, by default, ignores the Kafka message key when writing the data, so I'm going to say no.
Kafka data isn't indexed by key, it's partitioned by it, which means if you did use the DefaultPartioner rather than the FieldPartitioner, then all keys would land in a single filesystem path by Kafka partition. Only then would you be able to query, not by the key, though, but by the partition. for example using Spark or Hive. Again, that's the default behavior - you can use a Transform, as mentioned previously, to add the Kafka key into the data, which you can then query by it