1
votes

I'm trying to set up a NiFi Cluster on AWS using ECS with External Zookeeper.

Have created a ECS Cluster for Zookeeper with 3 EC2 instance and running zookeeper ensemble. This external Zookeeper is working fine as I tested with my SolrCloud and also on my local NiFi. The local NiFi cluster was set up based on https://www.nifi.rocks/apache-nifi-docker-compose-cluster
Now that I had made sure that NiFi cluster works well with my external zookeeper that is running on AWS, I created another ECS cluster with 2 EC2 instances for NiFi. I did make sure all the variables are set properly for a NiFi cluster. All checks from the list of env variables given in https://github.com/apache/nifi/tree/master/nifi-docker/dockerhub

Though NiFi is starting on both the 2 EC2 instances and they using my external zookeeper, the 2 NiFi are running as two separate clusters with one node with each cluster. Ideally I'm wanting them to run as 2 nodes in one cluster.

I did compare the nifi.properties from the local NiFi cluster and the AWS Clusters and they all look good.

Am I missing some obvious step here?

Exception is

WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to due to: java.net.UnknownHostException

Attempted to determine the node's information but failed to retrieve its information due to org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket due to: java.net.UnknownHostException
1
You need to be careful while setting up the Zookeeper properties in nifi.properties file. The root node of zookeeper should be mentioned same in instances of Nifi throughout the cluster. The curator elects one node as cluster coordinator as and when the nodes connect to zookeeper. You can connect to zkCli and check how the nodes are connecting. - Thota Srinath
I made sure the zookeeper root note is changed to nifi.zookeeper.root.node=/root which is the default node in zookeeper. Also, the docker-compose up --scale nifi=3 -d creates a 3 node cluster on my local and connects to this external zookeeper just fine. - WinnieDaPooh

1 Answers

2
votes
WARN [main] o.a.nifi.controller.StandardFlowService Failed to connect to cluster due to: org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket to due to: java.net.UnknownHostException

Attempted to determine the node's information but failed to retrieve its information due to org.apache.nifi.cluster.protocol.ProtocolException: Failed to create socket due to: java.net.UnknownHostException

So I was using "Default" for the "Network Mode" in my ECS "Task Definition". The default network mode is "bridge". Hence, the zookeeper was registering the docker container name as the host name which was not being resolved properly and hence the "java.net.UnknownHostException" exception. Basically it was looking for a host whose name is the container name and eventually the UnknownhostException.

The fix was to use the Network Mode as "Host" which will take the actual hostname of the EC2 instance as the hostname and not the container name. This resolved my issue.