Kafka Connect MongoDB Source Connector failure scenario

Question

I need to use Kafka Connect to monitor changes to a MongoDB cluster with one primary and 2 replicas.

I see there is the official MongoDB connector, and I want to understand what would be the connector's behaviour, in case the primary replica would fail. Will it automatically read from one of the secondary replicas which will become the new primary? I couldn't find information for this in the official docs.

I've seen this post related to the tasks.max configuration, which I thought might be related to this scenario, but the answer implies that it always defaults to 1.

I've also looked at Debezium's implementation of the connector, which seems to support this scenario automatically:

The MongoDB connector is also quite tolerant of changes in membership and leadership of the replica sets, of additions or removals of shards within a sharded cluster, and network problems that might cause communication failures. The connector always uses the replica set’s primary node to stream changes, so when the replica set undergoes an election and a different node becomes primary, the connector will immediately stop streaming changes, connect to the new primary, and start streaming changes using the new primary node.

Also, Debezium's version of the tasks.max configuration property states that:

The maximum number of tasks that should be created for this connector. The MongoDB connector will attempt to use a separate task for each replica set, [...] so that the work for each replica set can be distributed by Kafka Connect.

The question is - can I get the same default behaviour with the default connector - as advertised for the Debezium one? Because of external reasons, I can't use the Debezium one for now.

D. SM D. SM · Accepted Answer · 2020-12-15T02:36:08

In a PSS deployment:

If one node is not available, the other two nodes can elect a primary
If two nodes are not available, there can be no primary

The quote you referenced suggests the connector may be using primary read preference, which means as long as two nodes are up it will be working and if only one node is up it will not retrieve any data.

Therefore, bring down two of the three nodes and observe whether you are able to query.

Kafka Connect MongoDB Source Connector failure scenario

1 Answers