0
votes

I'm new to PDI and Kettle, and what I thought was a simple experiment to teach myself some basics has turned into a lot of frustration.

I want to check a database to see if a particular record exists (i.e. vendor). I would like to get the name of the vendor from reading a flat file (.CSV).

My first hurdle selecting only the vendor name from 8 fields in the CSV

The second hurdle is how to use that vendor name as a variable in a database query.

My third issue is what type of step to use for the database lookup.

I tried a dynamic SQL query, but I couldn't determine how to build the query using a variable, then how to pass the desired value to the variable.

The database table (VendorRatings) has 30 fields, one of which is vendor. The CSV also has 8 fields, one of which is also vendor.

My best effort was to use a dynamic query using: SELECT * FROM VENDORRATINGS WHERE VENDOR = ?

How do I programmatically assign the desired value to "?" in the query? Specifically, how do I link the output of a specific field from Text File Input to the "vendor = ?" SQL query?

2
How many VENDORRATINGS per Vendor ? Only one or many? What happens when/if a Vendor has no VENDORRATINGS ?AlainD
Which step do you call "Dynamic query". For the Dynamic SQL row, you need to built the SELECT...WHERE Vendor="<value>" in a field. For the Input table you'll loose all the field which are not in the table.AlainD
@AlainD there is only 1 VENDORRATING per vendor in the table. If there is no rating for a given vendor present in the CSV file, it should lead to a new hop where all 8 fields in the CSV are inserted as a new row into VENDORRATING..lucholland
@AlainD yes, I meant Dynamic SQL Row. Based on some research it seemed like an option for a dynamic query. Looking like a bad path to follow (no pun intended).lucholland
Dynamic SQL Row could be an answer. If for example you need to SELECT SUM(field) but, generally speaking, it requires some work to built a String containing the query.AlainD

2 Answers

1
votes

Due to the fact that

  1. There is at most one vendorrating per vendor.
  2. You have to do something if there is no match.

I suggest the following flow:enter image description here

Read the CSV and for each row look up in the table (i.e.: the lookup table is the SQL table rather that the CSV file). And put default upon not matching. I suggest something really visible like "--- NO MATCH ---".

Then, in case of no match, the filter redirect the flow to the alternative action (here: insert into the SQL table). Then the two flows and merged into the downstream flow.

1
votes

The best practice is a Stream lookup. For each record in the main flow (VendorRating) lookup in the reference file (the CSV) for the vendor details (lookup fields), based on its identifier (possibly its number or name or firstname+lastname).

enter image description here

First "hurdle" : Once the path of the csv file defined, press the Get field button.

It will take the first line as header to know the field names and explore the first 100 (customizable) record to determine the field types.

If the name is not on the first line, uncheck the Header row present, press the Get field button, and then change the name on the panel.

If there is more than one header row or other complexities, use the Text file input.

The same is valid for the lookup step: use the Get lookup field button and delete the fields you do not need.