0
votes

Currently I am using ConsumeKafkaRecord, Merge Content and PutHDFS to load the data from kafka to Hive. We need to automate this for multiple kafka topics. Is there any way to do it in nifi.

Eg : If I give topic names as abc,xyz the data from abc topic should be moved to /abc and data from xyz should be moved to /xyz folder.

Please suggest.

2

2 Answers

4
votes

The ConsumeKafkaRecord processor writes an attribute named kafka.topic that contains the name of the topic where records are from.

And the directory parameter of PutHDFS supports expression language.

so, you could specify something like /${kafka.topic} in it

0
votes

Regarding your second part of the question, you can merge files (using mergeContent) based on the 'kafka.topic' attribute. This way you can merge content of different topics using one processor. (Data of every topic will be merged only with the data of that particular topic)

You can then route the files to different directories in HDFS by mentioning the path as "hdfs://${your-hdfs-path}/${kafka.topic}"

Let me know if you need more assistance!