1
votes

According to the document mentioned below, it seems like if I will restart the processor it will reset the value of maximum column value I have provided and will start fetching data from the beginning.

Document Link: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache.nifi.processors.standard.QueryDatabaseTable/index.html

A comma-separated list of column names. The processor will keep track of the maximum value for each column that has been returned since the processor started running.

  • However, I tested this behavior, and even if I restart the processor I get incremental load only. is there a mistake in the document or have missed something?
  • What would happen if I re-deploy the job, I mean deleting the job and re-creating it from the template?
  • In the code, it has mentioned that the value will be stored as part of Scop.CLUSTER. would someone please explain to me what is it? and in which conditions the state will be cleared?

@Stateful(scopes = Scope.CLUSTER, description = "After performing a query on the specified table, the maximum values for " + "the specified column(s) will be retained for use in future executions of the query. This allows the Processor " + "to fetch only those records that have max values greater than the retained values. This can be used for " + "incremental fetching, fetching of newly added rows, etc. To clear the maximum values, clear the state of the processor " + "per the State Management documentation")

1

1 Answers

1
votes

Once the processor is started the first time, it will never reset it's value unless you go into the the "View State" menu of the processor and click "Clear State".

It would not make sense to clear the state when starting and stopping the processor because then any time NiFi restarted for maintenance or a crash then it would reset which would not be desired.

Where the state is stored is dependent on whether you are running a single node or a cluster. In a single node it is stored in a local write ahead log, in a cluster it is stored in ZooKeeper so all nodes can access it if necessary. In either case it stored by the UUID of the processor.