0
votes

We are currently on HDF (Hortonworks Dataflow) 3.3.1 which bundles Kafka 2.0.0 and are trying to use Kafka Connect in distributed mode to launch a Google Cloud PubSub Sink connector. We are planning on sending back some metadata into a Kafka Topic and need to integrate a Kafka producer into the flush() function of the Sink task java code.

Would this have a negative impact on the process where Kafka Connect commits back the offsets to Kafka (as we would be adding a overhead of running a Kafka producer before the flush).

Also, how does Kafka Connect get the Bootstrap servers list from the configuration when it is not specified in the Connector Properties for either the sink or the source? I need to use the same Bootstrap server list to start the producer. Currently I am changing the config for the sink connector, adding bootstrap server list as a property and parsing it in the Java code for the connector. I would like to use bootstrap server list from the Kafka Connect worker properties if that is possible.

Kindly help on this.

Thanks in advance.

1

1 Answers

0
votes

need to integrate a Kafka producer into the flush() function of the Sink task java code

There is no producer instance exposed in the SinkTask API...

Would this have a negative impact on the process where Kafka Connect commits back the offsets to Kafka (as we would be adding a overhead of running a Kafka producer before the flush).

I mean, you can add whatever code you want. As far as negative impacts go, that's up to you to benchmark on your own infrastructure. Obviously adding more blocking code makes the other processes slower overall

how does Kafka Connect get the Bootstrap servers list from the configuration when it is not specified in the Connector Properties for either the sink or the source?

Sinks and sources are not workers. Look at connect-distributed.properties

I would like to use bootstrap server list from the Kafka Connect worker properties if that is possible

It's not possible. Adding extra properties to the sink/source configs are the only way. (Feel free to make a Kafka JIRA requesting such a feature of exposing the worker configs, though)