0
votes

The task is to run defined number of transformations (.ktr) in parallel. Each transformation opens it's own database connection to read data. But we have a limitation on given user, who has only 5 allowed parallel connection to DB and let's consider that this could not be changed. So when I start job depicted below, only 5 transformations finish their work successfully, and other 5 fails with db connection error.

enter image description here

I know that there is an option to redraw job scheme to have only 5 parallel sequences, but I don't like this approach, as it requires reimplementation when count of threads changes.

Is it possible to configure some kind of pool of executors, so Pentaho job will understand that even if there were 10 transformations provided, only random 5 could be processed in parallel?

2

2 Answers

0
votes

I am assuming that you know the number of parallel database connections available. If you know this, use switch/case component and then number of transformations in parallel. Second option is to use job-executor.In Job Executor, if you can set variable which in turn call the job accordingly. For example, you are calling a job using job-executor with value
c:/data-integrator/parallel_job_${5}.kjb where 5 is number of connections available
or
c:/data-integrator/parallel_job_${7}.kjb where 7 is number of connections available
Is this making sense to you.

0
votes

The concept is the following:

  1. Catch database connection error during transformation run attempt
  2. Wait a couple of seconds
  3. Retry run of a transformation

Look at attached transformation picture. It works for me.

Disadvantages:

  • A lot of connection errors in the logs, which could confuse.
  • Given solution could turn in infinite loop (but could be amended to avoid it)

enter image description here