1
votes

I have a system pushing Avro data in to multiple Kafka topics.
I want to push that data to HDFS. I came across confluent but am not sure how can I send data to HDFS without starting kafka-avro-console-producer.

Steps I performed:

  1. I have my own Kafka and ZooKeeper running so i just started schema registry of confluent.

  2. I started kafka-connect-hdfs after changing topic name. This step is also successful. It's able to connect to HDFS.

After this I started pushing data to Kafka but the messages were not being pushed to HDFS.

Please help. I'm new to Confluent.

1
Which version of Confluent are you using? The latest one, i.e. version 3.1.1?Michael G. Noll
confluent-3.1.1chaitu

1 Answers

0
votes

You can avoid using the kafka-avro-console-producer and use your own producer to send messages to the topics, but we strongly encourage you to use the Confluent Schema Registry (https://github.com/confluentinc/schema-registry) to manage your schemas and use the Avro serializer that is bundled with the Schema Registry to keep your Avro data consistent. There's a nice writeup on the rationale for why this is a good idea to do here.

If you are able to send messages that were produced with the kafka-avro-console-producer to HDFS, then your problem is likely in the kafka-connect-hdfs connector not being able to deserialize the data. I assume you are going through the quickstart guide. The best results will come from you using the same serializer on both sides (in and out of Kafka) if you are intending to write Avro to HDFS. How this process works is described in this documentation.