5
votes

Brand new to Pentaho (and a newbie SO poster so look out!)

I'd like to use Kettle/PDI to transform data coming in from an RDBMS from this (for example):

Question1   Question2   Question3   Question4
1/1/13      123.00      Test 1      Test 1.1
1/2/13      124.00      Test 2      Test 1.2
1/3/13      125.00      Test 3      Test 1.3
1/4/13      126.00      Test 4      Test 1.4
1/5/13      127.00      Test 5      Test 1.5

to this:

QuestionName AnswerDate AnswerNumber AnswerString
Question1    1/1/13     
Question1    1/2/13     
Question1    1/3/13     
Question1    1/4/13     
Question1    1/5/13     
Question2               123.00
Question2               124.00
Question2               125.00
Question2               126.00
Question2               127.00
Question3                             Test 1
Question3                             Test 2
Question3                             Test 3
Question3                             Test 4
Question3                             Test 5
Question4                             Test 1.1
Question4                             Test 1.2
Question4                             Test 1.3
Question4                             Test 1.4
Question4                             Test 1.5

As hopefully reflected above, there should be an "Answer<FieldDataType>" column for each available datatype in the original table. Is this possible with PDI? If so, can someone provide me with some pointers? I've tried using the Row Normaliser step to pivot the table and assign the new fields, but am probably not doing things quite right (or there is a bug [PDI 4.4]).

3
Check the samples for that step. It does work, but every time I have to use it, i have to check the sample because for some reason configuring it isnt intuitive!Codek
Are you comfortable/allowed to use a javascript node in PDI?KLDavenport
Kevin D, yes to both. I guess with the JS step I could filter by type and add fields based on the type? If you have an example, I'd greatly appreciate it. I'm not opposed to using JS, but I wasn't sure if there was a more "native" way to do it.jriggins
I came across this issue a while ago, wonder if you are hitting the same thing, if so vote for the jira! jira.pentaho.com/browse/PDI-9467Codek
I think that I am. At any rate, I voted! Thanks for the head's up.jriggins

3 Answers

0
votes

I accomplished this by using a scripting step to write an output row containing the column and value for each column in the input row. From there, I went to a Regex Evaluation step and used multiple capture groups to map the value types to additional columns in the stream. I messed around with the Row Normaliser for a while, but couldn't get it to do exactly what I wanted. The performance loss of using a scripting step was negligible.

0
votes

use javascript step:

trans_Status = SKIP_TRANSFORMATION;
var row1 = createRowCopy(4); 
var row2 = createRowCopy(4); 
var row3 = createRowCopy(4); 
var row4 = createRowCopy(4); 
row1[0] = 'Question1';
row2[1] = 'Question2';
row3[2] = 'Question3';
row4[3] = 'Question4';
row1[1] = Question1;
row2[2] = Question2;
row3[3] = Question3;
row4[3] = Question4;
putRow(row1);
putRow(row2);
putRow(row3);
putRow(row4);

don't forget add fields;

0
votes

The Row Normalizer is very sensitive to the order you specify the de-normalization.

I had a sparse matrix input and discovered the following rules:

  1. The Type values must be grouped together, like with like
  2. The new field column must be in the same order for each Type grouping
  3. The Type groups must be arranged as most populous first, least populous last

Thus if, in the example given you specified

 Fieldname  Type   new field
 Question1  date   AnswerDate
 Question2  number AnswerNumber
 Question3  string AnswerString
 Question4  string AnswerString

will work better than

 Fieldname  Type   new field
 Question1  date   AnswerDate
 Question3  string AnswerString
 Question2  number AnswerNumber
 Question4  string AnswerString