Kafka Connect | Cannot complete request because of a conflicting operation

Question

1) We have 3 node kafka & kafka connect cluster

2) We are running kafka-connect on kafka nodes only in distributed mode

3) When i am trying to create a connector using below configuration :

    {
      "name": "connector-state-0",
      "config": {
        "connector.class": "io.debezium.connector.mysql.MySqlConnector",
        "database.user": "user",
        "database.server.id": "5023",
        "database.hostname": "hostname",
        "database.password": "password",
        "database.history.kafka.bootstrap.servers": "ip:9092",
        "database.history.kafka.topic": "topicname",
        "database.server.name": "prod",
        "database.port": "3306",
        "snapshot.mode": "when_needed",
        "include.schema.changes": "false",
        "table.whitelist": "country.state"
    }
   }

On the request to create a connector it is giving me below error on 2 of 3 nodes :

{"error_code":409,"message":"Cannot complete request because of a conflicting operation (e.g. worker rebalance)"}

On one of the node : I am able to create a connector but task didn't started and i can see below error in logs :

[2019-01-23 10:50:06,455] INFO 127.0.0.1 - - [23/Jan/2019:10:50:06 +0000] "POST /connectors/birdeye-connector-state-0/tasks?forward=true HTTP/1.1" 409 113  8 (org.apache.kafka.connect.runtime.rest.RestServer:60)
[2019-01-23 10:50:06,462] INFO 127.0.0.1 - - [23/Jan/2019:10:50:06 +0000] "POST /connectors/birdeye-connector-state-0/tasks HTTP/1.1" 409 113  21 (org.apache.kafka.connect.runtime.rest.RestServer:60)
[2019-01-23 10:50:06,466] ERROR Request to leader to reconfigure connector tasks failed (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1020)
org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Cannot complete request because of a conflicting operation (e.g. worker rebalance)
    at org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:97)
    at org.apache.kafka.connect.runtime.distributed.DistributedHerder$18.run(DistributedHerder.java:1017)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

I am not able to figure out what is causing the isssue.

Note that running Kafka Connect on the same nodes as Kafka brokers is not recommended. — Robin Moffatt
When you successfully run it on the one node, and see that error in the log, was anything else happening at the same time? e.g. task rebalance? — Robin Moffatt
@RobinMoffatt: No ... What could be the possible reasons for the same ? — Sahil Gupta
@RobinMoffatt: I can see below logs very frequently on the node on which error is coming : Added READ_UNCOMMITTED fetch request for partition connect-configs-0 at offset 233 to node prod-paid-kafka-node-api-1.birdeye.com:9092 (id: 0 rack: null) (org.apache.kafka.clients.consumer.internals.Fetcher:843) — Sahil Gupta
Can you check if it is actually starting your connector or not(irrespective of error message)? Try creating connector from leader worker. Also make sure that port describe in "rest.advertised.port" from your worker.config is not used across by any other process across all nodes. — suraj_fale

Robin Moffatt Robin Moffatt · Accepted Answer · 2019-11-22T12:16:17

You need to set rest.advertised.host.name to the host or IP that the other Kafka Connect workers can resolve and connect to. This is because it is used for the internal communication between workers.

If your REST request hits a worker that is not the current leader of the cluster, that worker will try to forward the request to the leader. It does this using the rest.advertised.host.name. But if rest.advertised.host.name is localhost then the worker will simply be forwarding the request to itself and hence things won't work. Of your three workers one will be the leader, which is why you've found that this fails for two out of three.

For more details see https://rmoff.net/2019/11/22/common-mistakes-made-when-configuring-multiple-kafka-connect-workers/

Kafka Connect | Cannot complete request because of a conflicting operation

1 Answers