2
votes

If you have 10 datanodes on an existing Hadoop cluster could you install NiFi on 4 or 6 datanodes?

The main purpose of NiFi would be loading data daily from RDBMS to HDFS, high volume.

Datanodes would be configured with high RAM lets say 100GB. External 3 node Zookeeper cluster would be used.

  • Are there any major concerns with this approach?
  • Does it make more sense to just install NiFi on EVERY datanode, so 10?
  • Are there any issues with having a large cluster of 10 nifi nodes?
  • Will some NiFi configuration best practices conflict with Hadoop config?

Edit: Currently using Hortonworks version 2.6.5 and open source NiFi 1.9.2

2

2 Answers

1
votes

Are there any major concerns with this approach?

Cloudera Data platform is integrated with Cloudera Dataflow which on based on Apache NiFi, so integration should not be a concern.

Does it make more sense to just install NiFi on EVERY datanode, so 10?

Depends on what traffic you are expecting, but I would consider NiFi a standalone service, such as Kafka, Zookeeper... so a cluster of 3 would be a great start and maybe increasing if needed. Starting will all DataNodes is not required. It is ok to share these services with DataNodes, just make sure resources are allocated correctly (cores, memory, storage...) - this is easier with Cloudera.

Are there any issues with having a large cluster of 10 nifi nodes?

More info on scaling on 6) NiFi Clusters Scale Linearly. You should have a lot of traffic to go over 10 nodes.

Will some NiFi configuration best practices conflict with Hadoop config?

That depends on how you configure it. I would advise using Cloudera for both, which is very tested to work together. You may not end up with latest versions for your services, but at least you have a higher reliability.

0
votes

Even if you have an existing HDP 2.6.5 cluster, or perhaps by now you upgraded to HDP 3 or even its successor CDP, you can use the Hortonworks/Cloudera Nifi solution via your management console. So if you currently use Ambari (or its counterpart Cloudera Manager) the recommended way to install Nifi is through that.

It will be called Hortonworks Data Flow or Cloudera Data Flow respectively.

Regarding the other part of your question: Typically it is recommended to install Nifi on dedicated nodes, and 10 nodes is likely overkill if you are not sure.

Here is some information on sizing your Nifi deployment (note that Cloudera and Hortonworks have merged, so though the site is called Cloudera this page is actually written with a HDP cluster in mind, of course that does not impact the sizing).

https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.1.1/bk_planning-your-deployment/content/ch_hardware-sizing.html

Full disclosure: I am an employee of Cloudera (formerly Hortonworks)