How to assign csv field value to SQL query written inside table input step in Pentaho Spoon

Question

I am pretty new to Pentaho so my query might sound very novice.

I have written a transformation in which am using CSV file input step and table input step. Steps I followed:

Initially, I created a parameter in transformation properties. The parameter birthdate doesn't have any default value set.
I have used this parameter in postgresql query in table input step in the following manner:
```
select * from person where EXTRACT(YEAR FROM birthdate) > ${birthdate};
```
I am reading the CSV file using CSV file input step. How do I assign the birthdate value which is present in my CSV file to the parameter which I created in the transformation?

(OR)
Could you guide me the process of assigning the CSV field value directly to the SQL query used in the table input step without the use of a parameter?

MrMauricioLeite MrMauricioLeite · Accepted Answer · 2016-08-08T14:10:18

TLDR;

I recommend using a "database join" step like in my third suggestion below.

See the last image for reference

First idea - Using Table Input as originally asked

Well, you don't need any parameter for that, unless you are going to provide the value for that parameter when asking the transformation to run. If you need to read data from a CSV you can do that with this approach.

First, read your CSV and make sure your rows are ok.
After that, use a select values to keep only the columns to be used as parameters.
In the table input, use a placeholder (?) to determine where to place the data and ask it to run for each row that it receives from the source step.

Just keep in ming that the order of columns received by the table input (the columns out of the select values) is the same order that it will be used for the placeholders (?). This should not be a problem with your question that uses only one placeholder, but keep that in mind as you ramp up using Pentaho.

Second idea, using a Database Lookup

This is another approach where you can't personalize the query made to the database and may experience a better performance because you can set a "Enable cache" flag and if you don't need to use a function on your where clause this is really recommended.

Third idea, using a Database Join

That is my recommended approach if you need a function on your where clause. It looks a lot like the Table Input approach but you can skip the select values step and select what columns to use, repeat the same column a bunch of times and enable a "outer join" flag that returns the rows without result from the query

ProTip: If you feel the transformation running too slow, try to use multiple copies from the step (documentation here) and obviously make sure the table have the appropriate indexes in place.