Can kafka connect - mongo source run as cluster (max.tasks > 1)

Question

I'm using the following mongo-source which is supported by kafka-connect. I found that one of the configurations of the mongo source (from here) is tasks.max.

this means I can provide the connector tasks.max which is > 1, but I fail to understand what it will do behind the scene?

If it will create multiple connectors to listen to mongoDb change stream, then I will end up with duplicate messages. So, does mongo-source really has parallelism and works as a cluster? what does it do if it has more then 1 tasks.max?

Bartosz Wardziński Bartosz Wardziński · Accepted Answer · 2019-12-18T13:40:38

Mongo-source doesn't support tasks.max > 1. Even if you set it greater than 1 only one task will be pulling data from mongo to Kafka.

How many task is created depends on particular connector. Function List<Map<String, String>> Connector::taskConfigs(int maxTasks), (that should be overridden during the implementation of your connector) return the list, which size determine number of Tasks. If you check mongo-kafka source connector you will see, that it is singletonList.

https://github.com/mongodb/mongo-kafka/blob/master/src/main/java/com/mongodb/kafka/connect/MongoSourceConnector.java#L47

Can kafka connect - mongo source run as cluster (max.tasks > 1)

1 Answers