I'm new to Kettle, but getting on well with it so far. However I can't figure out how to do this.
I have a csv which looks something like this
a, col1, col2, col3
a, col1, col2, col3
a, col1, col2, col3
b, col1, col2, col3
b, col1, col2, col3
c, col1, col2, col3
c, col1, col2, col3
The first column starts with a key (a,b,c), and then the rest of the columns follow. What I want to do is read in the csv (got that covered) and then split the csv based on key, so I have 3 chunks/ groups of data and then convert each of those chunks of data into a separate json file, which I think I can get.
What I can't get my head around is the grouping the data and then performing a separate action (convert to json) on each of those separate groups. Its not the creating json I have an issue with.
The data is from a sensor network of many environmental sensors so there are many keys, hundreds, and new ones get added. I've used map reduce to process this data before as the concept of partitioning is what I'm trying to replicate here, without using the hadoop elements of kettle as the deployment is different. Once I've partitioned the data it needs to be loaded into different places as seperate records. The key is a unique ID (serial number) of a sensor.
Any ideas please?
Thanks