0
votes

i need to ingest multiple csv files based on table names into their respective hive tables using apache nifi. the data for table_address present in source json file should go to table_address in hive and similarly for other tables. In short, records from source json file needs to be segregated into multiple csv files with tablename.csv format and loaded into their respective hive tables.

processors i am using consume kafka ---> splitjson ----> evaluatejsonpath ----> updateattribute ----> replacetext ----> putfile

Records from source json file consumed from kafka Golden gate trials needs to be segregated into multiple csv files with tablename.csv format and loaded into their respective hive tables using apache nifi flow.

1

1 Answers

2
votes

You can use PartitionRecord processor in NiFi.

  • Configure Record Reader(json)/Writer(csv) controller services

  • Output flowfile will be in csv format and based on partition column value you can store data into hive tables dynamically.

Flow:

Consume Kafka --> 
Partition Record (specify partition field) --> 
PutFile (or) PutHiveStreaming (or) PutHDFS(based on the value of partition field)