We are working on building the Kafka-connect application using JDBC source connector in increment+timestamp mode. We tried the Standalone mode and It is working as expected. Now, we would like to switch to distributed mode.
When we have a single Hive table as a source, How the tasks will be distributed among the workers?
The problem we faced was when we run the application in multiple instances, It is querying the table for every instance and fetching the same rows again. Does parallelism will work in this case? If so,
How does the tasks will co-ordinate with each other on the current status of table ?