1
votes

I am writing kettle transformation.

My input file looks like following

sessionId=40936a7c-8af9|txId=40936a7d-8af9-11e|field3=val3|field4=val4|field5=myapp|field6=03/12/13 15:13:34|

Now, how do i process this file? I am completely at loss.

First step is CSV file input with | as delimiter

My analysis will be based on "Value" part of name value pair.

Has anyone processes such files before?

2
It may be useful to run the file through a simple command line tool to strip the "fieldname=" portion out. If your file's on a unix machine, a set of sed commands could do the trick. I don't know about kettle though, so I can't answer the immediate question. Is there a way to set multiple delimiters (| and =)? Then you could just look at fields 2, 4, 6, 8, etc... - N West
I know nothing about kettle, but example 2 in the documentation for the Field Splitter uses data that is effectively in the same format as yours. Otherwise, as @NWest suggested just use a script to pre-process the data in some way. But hopefully someone who actually knows something about kettle will be able to suggest a specific solution. - Pondlife
This worked. Ah!!! the bad habit of not going through examples. - Thoughtful Monkey

2 Answers

2
votes

Since you have already splitted the records into fields of 'key=value' you could use an expression transform to cut the string into two by locating the position of the = character and create two out ports where one holds the key and the other the value.
From there it depends what you want to do with the information, if you want to store them as key/value route them trough a union, or use a router transform to send them to different targets.

Her is an example of an expression to split the pairs: enter image description here

1
votes

You could use the Modified Javascript Value Step, add this step after this grouping with pipes.

Now do some parsing javascript like this:

var mainArr = new Array();
var sessionIdSplit = sessionId.toString().split("|");

for(y = 0; y < sessionIdSplit.length; y++){ 
mainArr[y] = sessionIdSplit[y].toString();

    //here you can add another loop to parse again and split the key=value

}

Alert("mainArr: "+ mainArr);