1
votes

I have a Kettle transformation that inserts data in a number of database tables. For each table there is a separate transformation (with an injection step) that makes some calculations, checks the data and finally inserts. These sub-transformations are called using a single threader step.

The main transformation looks something like this:

Input from -----> Dummy -----> Dummy -----> Dummy -----> Done
   file             |            |            |
                    |            |            |
                    v            v            v
                  Select       Select       Select
                  values       values       values
                    |            |            |
                    |            |            |
                    v            v            v
                  Single       Single       Single
                threader 1   threader 2   threader 3

My problem is that I want to make sure that Single threader 1 finishes for a specific row before Single threader 2 runs for that row, and so on. This is because the first single threader adds a post in one table that then should be referenced in later tables and the database will throw an error if a reference to a post that does not exist (yet) is inserted.

I can't put the single threaders in one line because I need to discard all but a few fields of about 50 in total to match the injectors of the single threaders. That is what the select values does.

How can I solve this?

1
Did u have a look on step "Block this step until steps finish" ? - simar
@simar Yes, but I am not sure how to use it. If I understand it right, it blocks until all rows has passed through a step. That means that all my rows (10k+) needs to buffer somewhere, plus I cant take advantage of parallellism. I just want to block until one row has passed through a certain step. - Anders
Well if u don't need extensive data transform operation when I don't think it's a problem (or u can prepare all data before to start perform DDL). Batch insert in JDBC very effective. In my personal experience over million record was written into DB in less then 10s - simar
@simar Ok, I will try that and see how the performance is. Thank you. - Anders

1 Answers

1
votes

Here is example of such case

create table a (id serial primary key, md5 text not null);
create table b (id serial primary key, md5 text not null, a_id int not  null);
create table c (id serial primary key, md5 text not null, a_id int not null, b_id int not null);

And example of tranformation

enter image description here

I assume that data is flat so that each row at the beginning has data for all tables A, B and C.