2
votes

We are receiving Json messages from upstream system via Kafka topic. Requirement is to store these messages into HDFS at certain interval. Since we are storing into HDFS we want to merge certain number of these Records in to single file. As per NiFi documentation we are using "MergeRecords" processor for that.

About the in coming Records:##

  • These are the multi-line JSon messages with nested structure.
  • Those are based on the same schema (they are picked from single Kafka topic)
  • Those are validated messages and even NiFi processor is able to parse it. so apparently no issues with JSon messages from Schema point of view

Present Configuration

Below is the snapshot of the Processor Configuration. NiFi version: 1.8

enter image description here

Expected behavior

For the Above configuration its expected that MergeRecords should have weighted for one of the thresholds i.e. Maximum records(100000) or Maximum Bean size(100KBs).

Observed Behavior

But its observed that bean is getting bundled pretty before either of the threshold is reached. It is triggering the bean formation only for 2 records of 5KB size.

If you could help with analysis and/or pointers as why MergeRecord processor is not behaving as per the configuration?

1
Are you getting more than 2 records in the Bin Age of 1 minute? - Bryan Bende
Yes @Bryan Bende, we tried sending at least 20 messages in 1 minute. - Para_Conscious

1 Answers

1
votes

Perhaps it is not waiting for Maximum records(100000) or Maximum Bean size(100KBs) because it hits the Max Bin Age that you specified first (1 minute).

Max Bin Age is defined in the docs as:

The maximum age of a Bin that will trigger a Bin to be complete.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.8.0/org.apache.nifi.processors.standard.MergeRecord/index.html