1
votes

Like the question states, is there some way to synchronize NiFi process groups or pipelines that don't/can't connect in the UI?

Eg. I have a process where I want to getFTP->putHDFS->moveHDFS (which ends up actually being getFTP->putHDFS->listHDFS->moveHDFS, see https://stackoverflow.com/a/50166151/8236733). However, listHDFS does not seem to take any incoming connections. Trying to do something with process groups like P1{getFTP->putHDFS->outport}->P2{inport->listHDFS->moveHDFS} also runs into the same problem (listHDFS can't seem to take any incoming connections). We don't want to moveHDFS before we ever even get anything from getFTP, but given the above, I don't see how these actions can be synchronized to occur in the right order.

New to NiFi, but I imagine this is a common use case and there must be some NiFi-ish way of doing this that I am missing. Advice in this would be appreciated. Thanks.

1

1 Answers

0
votes

I'm not sure what requirement is preventing you from writing the file retrieved from FTP directly to the desired HDFS location, or if this is a "write n files to HDFS with a . starting the filename and then rename all when some certain threshold is reached" scenario.

ListHDFS does not take any incoming relationships because it should not be triggered by an incoming event, but rather on a timer/CRON schedule. Every time it runs, it will produce n flowfiles, where each references an HDFS file that has been detected to be written to the filesystem since the last execution. To do this, the processor stores local state.

Your flow segments do not need to be connected in this case. You'll have "flow segment A" which performs the FTP -> HDFS writing (GetFTP -> PutHDFS) and you'll have an independent "flow segment B" which lists the HDFS directory, reads the file descriptors (but not the content of the file unless you use FetchHDFS as well) and moves them (ListHDFS -> MoveHDFS). The ListHDFS processor will run constantly, but if it does not detect any new files during a run, it will simply yield and perform a no-op. Once the PutHDFS processor completes the task of writing a file to the HDFS file system, on the next ListHDFS execution, it will detect that file and generate a flowfile describing it.

You can tune the scheduling to your liking, but in general this is a very common pattern in NiFi flows.