1
votes

ListFile processor is not detecting any changes to a previously processed file and reprocess it. FYI, I have tried the following options already for reprocessing and only the finally mentioned hack is working. This is in a single-node NiFi I am running in my development environment.

  • Update Scenario: ListFile processor is not detecting file content changes and trigger automatically post-update (i.e file updates using VIM editor)
  • Timestamp modification Scenario: Changing the file timestamp with touch -c command changes the file timestamp but this does not cause auto-trigger of the ListFile processor either.
  • Stop-start Scenario: Stop-start of the whole process group in NiFi after changing the file as mentioned above also does not cause triggering of ListFile processor.
  • Waiting Clause: Waiting for long enough after file change also does not help - just in case we assume it will auto-trigger after some delay.
  • HACK: The only way I am able to trigger the re-processing of the file by ListFile processor is by changing the wildcard expression for "File Filter" in ListFile processor in a harmless, idempotent manner, for example from .*test.*\.csv to test.*\.csv and vice versa later (i.e go back and forth like this for repeated reprocessing).

Reprocessing of files with same old names and with modified data is a requirement for us. Please help!

And sometimes forced reprocessing of even an unmodified file could be required in case of unanticipated data issues upstream/downstream. Please help!

UPDATE

Still facing this sporadic behavior! Only restart of NiFi helps when the ListFile processor fails to respond to file change.

1
Try ensuring that the ListFile processor has this configuration: Minimum File Age=0 sec, Maximum File Age is empty, Minimum File Size=0B, Maximum File Size is empty. - Jagrut Sharma
Can you please show your ListFile configuration? And which version of NiFi are you using? - Sivaprasanna Sethuraman
What version of NiFi do you use? I have just checked this in 1.5.0 and it worked correctly. - mateharu
@JagrutSharma - The min/max properties are left to default which is matching what you have mentioned (Minimum File Age=0 sec, Maximum File Age is empty, Minimum File Size=0B, Maximum File Size is empty). - janeshs
Maybe something is messed up with timestamps maintained by the processor... Have you tried clearing its state? This option is disabled when ListFile is running. Stop the processor -> right click -> View state -> Clear state. All matching files should be picked up once the processor is started again regardless if those files were already processed or not. - mateharu

1 Answers

3
votes

Probably this is delayed answer. The old List processors like ListFiles/ListFtp/ListSftp etc. used only timestamp tracking strategy to identify the changed files. The processor used to cache last seen timestamp in its processor state and use it to list files with only greater timestamp. However, this approach was very buggy. Hence they had to come up with much better strategy which is called Entity Tracking. This approach gives broad range of monitoring on file changes. It keeps track of below parameters of each file in the specified directory.

  1. Name
  2. Size
  3. Last modified timestamp

Any change in file is reflected in these key parameters. Since they are cached, any difference is treated as change, thus changed files appear in the success connection.