1
votes

I am using PutHBaseJSon processor that will fetch data from hdfs location and to put it into hbase.The data present in hdfs location is like below format and this is in a single file.

{"EMPID": "17", "EMPNAME": "b17", "DEPTID": "DNA"}            
{"EMPID": "18", "EMPNAME": "b18", "DEPTID": "DNA"}
{"EMPID": "19", "EMPNAME": "b19", "DEPTID": "DNA"}

when I execute the PutHBaseJSon processor, it's only fetching the first row and put it into the hbase table which I have created. Can't we able to fetch all the rows present in that file using this processor? or How to fetch all the records from the single file to hbase?

2

2 Answers

1
votes

PutHBaseJSON takes a single JSON document as input. After fetching from HDFS, you should be able to use the SplitText processor with a line count of 1 to get each of your JSON documents into a single flow file.

If you have millions of JSON records in a single HDFS file, then you should perform a two phase split, the first SplitText should split with a line count of say 10,000 then then a second SplitText should split those down to 1 line each.

-1
votes

You can make use of SplitJson processor to split them as individual records serially they will be sent to puthbasejson