0
votes

I have an issue with small files and HDFS.

Scenario: I am using NiFi to read messages from the Kafka topic, these are all really small.

Requirement: to store these raw messages of data in HDFS(for replay capability)...before doing further processing on them.

I was thinking using Hadoop Archive (HAR) on them periodically. Is that something i can do through NiFi? the har command seems like a command line thing rather than something that i could execute through Nifi? Would love to know a solution that can achieve my requirement, without bringing down HDFS due to the small files.

Ginil

1

1 Answers

1
votes

You can execute command line inside Nifi with ExecuteProcess processor :

http://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache.nifi.processors.standard.ExecuteProcess/

You can also take a look at Kafka-connect HDFS for putting kafka records into HDFS.