How to limit Nifi processor to run on a single node in cluster?

Question

We are building a data workflow with NiFi and want the final (custom) processor (which runs the deduplication logic) to run only one one of the NiFi cluster nodes (instead of running on all of them). I see that NiFi 1.7.0 (which is not yet released) has a PrimaryNodeOnly annotation to enforce a single node execution behaviour. Is there a way or workaround to enforce such behaviour in NiFi 1.6.0?

NOTE: In addition to @PrimaryNodeOnly, it would be better if NiFi provides a way to run a processor on a single node only (i.e., some annotation like @SingleNodeOnly). This way the execution node need not necessarily be the primary node which therefore will reduce the load on primary node. This is just an ask for future and not necessary to solve the problem mentioned above.

Do you want to run only one instance of the processor at the same time with no need to assign it to a particular node, or you want it to run always on the same node every time? — Radhwane Chebaane
We want to run on one instance of the processor at the same time with no need to assign it to a particular node. No, we don't want to run it always on the same node. — janeshs
In that case setting the processor to run on Primary Node would be enough. BTW, the PrimaryNodeOnly annotation is for developers who write processors to restrict the processor's execution strategy to be Primary. Dataflow designer/developer can't change the strategy to All Nodes through UI or API. — Sivaprasanna Sethuraman
@janeshs in that case, in the SCHEDULING tab of processor configuration, if you keep default Scheduling strategy (Timer driven) and set "Concurrent Tasks" to 1 you will have only one instance of the processor running at ounce. — Radhwane Chebaane
@RadhwaneChebaane - "Concurrent task configuration is per node" as mentioned by Matt Clarke in https://community.hortonworks.com/questions/52112/nifi-load-distribution-in-getfile-processor.html — janeshs

mattyb mattyb · Accepted Answer · 2018-06-19T18:18:58

There is no specific workaround to enforce it in previous versions, it is on the data flow designer to mark the intended processor(s) to run on the Primary Node only. You could write a script to query the NiFi API for processors of certain types or names, then check/set the strategy as Primary Node Only.

How to limit Nifi processor to run on a single node in cluster?

2 Answers