0
votes

My use case is to create a Producer using Apache Hive table records and send it to a Kafka Topic.

I explored Confluent Kafka Connect, but so far they only offered a demo from Kafka Topic to Hive tables using HDFS Connector.

Has anyone done one from source Hive tables to a Kafka Topic using Kafka Connectors?
Or are there other options like writing custom Java API package that we can use?
I'm well open to other options.

1
Where does the data come from before it lands in HDFS? A more common pattern would be [data source] --> [Kafka] --> [HDFS] than [data source] --> [HDFS] --> [Kafka] - Robin Moffatt
It comes from many sources, different RDBMS. It was not my decision but it was put into Hive (data storage) then there are some simple transformation before we want to send it to Kafka. - Nk.Pl

1 Answers

0
votes

You were correct that Kafka HDFS Connect is only a Sink, not a Source for Kafka. Edit: Seems there is now an HDFS3Source (under a trial license)

Personally, I would skip Hive entirely, and read from HDFS.

You can use pure Java to do this, or you can use Spark/Flink for integration with those Kafka libraries, or you can try more visual tools like Apache NiFi or Streamsets to pull HDFS data and send it to Kafka.

Usually, HDFS is not a source for Kafka data, anyway, from what I've seen. If you need to pull things out, Spark seems to be the most commonly used tool to do that. Writing to a Kafka topic is an implementation detail.