0
votes

I am new to Kafka Connect.

Scenario: We want to export data stored in multiple(good number > 400) topics in Kafka and dump that data into Elastic Search Indexes. Our Firm's Kafka is Confluent Kafka and it is Kerberized. I am able to write Producer and Consumers through Kafka APIs as we know the brokers and have keytab file.

Suggestion was to use Kafka-Connect but since its a Multi-tenant cluster, Ops team might not be able to provide direct access or even run plugin installation commands. We have our own managed VMs where we deploy our application instances.

Ques - Is it possible to run Kafka-Connect in distributed mode by having connector plugins on our VMs instead of them getting installed on Confluent Kafka? Can we run the connector work without any commands being run on Confluent Kafka? I am ready to put the kafka-connect connector plugins on all my VM instances.

Update

We are not allowed to do a put request on the kafka-connect cluster (to create a new connector instance), so is it possible to still use kafka-connect? If yes then do we need to run our own Kafka and just specify the bootstrap servers as the production kafka clusters? Cant find any video/article doing this.

1

1 Answers

2
votes

Yes, this is possible. In fact, in a Production deployment you would usually not install Kafka Connect directly onto a Kafka broker. You can see a reference architecture here for more information.

Kafka Connect runs as a separate process to the Kafka broker, known as the Kafka Connect Worker. Each Worker is a JVM process that you deploy and configure with the details of your Kafka cluster, which it connects to and acts as a producer/consumer (depending on whether it's a source/sink you're creating). This talk explains some of the basics of the runtime.

So you provision one or more machines on which to run Kafka Connect, give them the same group ID so that they form a cluster (of Kafka Connect workers), they connect to your Kafka cluster (just as any producer/consumer application separate from the cluster would), and on your Kafka Connect workers you install the required plugins.

For loading data to Elasticsearch you've got the Kafka Connect Elasticsearch plugin, for which there's a tutorial you can watch here.