I'm evaluating Nifi for our ETL process. I want to build the following flow: Fetch a lot of data from SQL database -> Split into chunks 1000 records each -> Count error records in each chunk -> Count total number of error records -> If it exceeds a threshold Fail process -> else save each chunk to the database.
The problem I can't resolve is how to wait until all chunks are validated. If for example I have 5 validation tasks working concurrently, I need some kind of barrier to wait until all chunks are processed and only after that run error count processor because I don't want to save invalid data and delete it if the threshold is reached.
The other question I have is if there is any possibility to run this validation processor on multiple nodes in parallel and still have the possibility to wait until they all are completed.