I'm trying to write a pretty simple XML file stored in HDFS to HBase. I'd like to transform the XML file into json format and create one row in HBase for each element within the json array. See following the XML structure:
<?xml version="1.0" encoding="UTF-8"?>
<customers>
<customer customerid="1" name="John Doe"></customer>
<customer customerid="2" name="Tommy Mels"></customer>
</customers>
And see following the desired HBase output rows:
1 {"customerid"="1","name"="John Doe"}
2 {"customerid"="2","name"="Tommy Mels"}
I've tried out many different processors for my flow but this is what I've got now: GetHDFS -> ConvertRecord -> SplitJson -> PutHBaseCell. The ConvertRecord is working fine and is converting the XML file to json format properly but I can't manage to split the json records into 2. See following what I've managed to write in HBase so far (with a different processors combination):
c5927a55-d217-4dc1-af04-0aff743 column=person:rowkey, timestamp=1574329272237, value={"customerid":"1","name":"John Doe"}\x0A{
cfe4e "customerid":"2","name":"Tommy Mels"}
For the splitjson processor I'm using the following jsonpathexpression: $.*
As of now, I'm getting an IllegalArgumentException in the PutHBaseCell processor stating that the Row length is 0, see following the PutHBaseCell processor settings:
Any hints?
