0
votes

Pentaho Data Integration: CE 6.1.0.1-196

I am newbie in Pentaho Data Integration. I need to run the same query in multiple databases. I created a table in the master database to store the connection information from other databases that need to be consulted. Below the table structure.

SQL> desc database_connection;
Name          Type          Nullable Default Comments 
------------- ------------- -------- ------- -------- 
DATABASE_NAME VARCHAR2(32)  Y                         
JDBC_URL      VARCHAR2(512) Y                         
USERNAME      VARCHAR2(32)  Y                         
PASSWORD      VARCHAR2(32)  Y
ENABLED       VARCHAR2(1)   Y   

Sample Data

DATABASE_NAME: XPTO
JDBC_URL: (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = xptosrv.xyz.com)(PORT = 1521))(LOAD_BALANCE = ON)(FAILOVER = ON)(CONNECT_DATA = (SERVER = DEDICATED)(SERVICE_NAME = XPTO.XYZ.COM)(FAILOVER_MODE = (TYPE = SELECT)(METHOD = BASIC)(RETRIES = 180)(DELAY = 5))))
USERNAME: SYSTEM
PASSWORD: blablabla
ENABLED: Y

My .ktr files:

(set_variables.ktr)

Table Input ---> Copy rows to result

The query associated with the input table run in master database.

select database_name, jdbc_url, username, password from database_connection where enabled = 'Y'

(db_query.ktr)

Table Input ---> Table output

The query associated with the table input run o (multiple databases) and store data in table output (master database)


My .kjb files:

(run_for_each_row.kjb)

Start ---> Transformation ---> Success

Transformation filename: ${Internal.Job.Filename.Directory}/db_query.ktr

Job Properties Parameters:

DATABASE_NAME JDBC_URL PASSWORD USERNAME

(master_job.kjb)

Start ---> Transformation ---> Job for each row ---> Success

Transformation filename: ${Internal.Job.Filename.Directory}/set_variables.ktr

Job for each row filename: ${Internal.Job.Filename.Directory}/run_for_each_row.kjb

Job for each row ... Advanced tab Copy previous results to parameters -> checked Execute for every input row -> checked

Job for each row ... Parameters: DATABASE_NAME, JDBC_URL, PASSWORD, USERNAME

Execution log:

2016/10/06 10:36:15 - Spoon - Iniciando o job...
2016/10/06 10:36:15 - master_job - Início da execução do job
2016/10/06 10:36:15 - master_job - Starting entry [Transformation]
2016/10/06 10:36:15 - Transformation - Loading transformation from XML file [file:///D:/pdi/set_variables.ktr]
2016/10/06 10:36:15 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
2016/10/06 10:36:15 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
2016/10/06 10:36:15 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
2016/10/06 10:36:15 - set_variables - Expedindo in?cio para transforma??o [set_variables]
2016/10/06 10:36:15 - Table input.0 - Finished reading query, closing connection.
2016/10/06 10:36:15 - Copy rows to result.0 - Finished processing (I=0, O=0, R=6, W=6, U=0, E=0)
2016/10/06 10:36:15 - Table input.0 - Finished processing (I=6, O=0, R=0, W=6, U=0, E=0)
2016/10/06 10:36:15 - master_job - Starting entry [Job for each row]
2016/10/06 10:36:15 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
2016/10/06 10:36:15 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
2016/10/06 10:36:15 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
2016/10/06 10:36:15 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
2016/10/06 10:36:15 - slave_job - Starting entry [Transformation]
2016/10/06 10:36:15 - Transformation - Loading transformation from XML file [file:///D:/pdi/db_query.ktr]
2016/10/06 10:36:15 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
2016/10/06 10:36:15 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
2016/10/06 10:36:15 - cfgbuilder - Warning: The configuration parameter [org] is not supported by the default configuration builder for scheme: sftp
2016/10/06 10:36:15 - db_query - Expedindo in?cio para transforma??o [db_query]
2016/10/06 10:36:15 - Table input.0 - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : An error occurred, processing will be stopped: 
2016/10/06 10:36:15 - Table input.0 - Error occurred while trying to connect to the database
2016/10/06 10:36:15 - Table input.0 - 
2016/10/06 10:36:15 - Table input.0 - Error connecting to database: (using class oracle.jdbc.driver.OracleDriver)
2016/10/06 10:36:15 - Table input.0 - Erro de ES: Connect identifier was empty.
2016/10/06 10:36:15 - Table input.0 - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : Erro inicializando step [Table input]
2016/10/06 10:36:15 - Table output.0 - Connected to database [REPORT] (commit=1000)
2016/10/06 10:36:15 - db_query - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : Step [Table input.0] falhou durante inicializa??o!
2016/10/06 10:36:15 - Table input.0 - Finished reading query, closing connection.
2016/10/06 10:36:15 - Transformation - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : Unable to prepare for execution of the transformation
2016/10/06 10:36:15 - Transformation - ERROR (version 6.1.0.1-196, build 1 from 2016-04-07 12.08.49 by buildguy) : org.pentaho.di.core.exception.KettleException: 
2016/10/06 10:36:15 - Transformation - Falhou a inicializa??o de pelo menos um step. A Execu??o n?o pode sere iniciada!
2016/10/06 10:36:15 - Transformation - 
2016/10/06 10:36:15 - Transformation - 
2016/10/06 10:36:15 - Transformation -  at org.pentaho.di.trans.Trans.prepareExecution(Trans.java:1142)
2016/10/06 10:36:15 - Transformation -  at org.pentaho.di.trans.Trans.execute(Trans.java:612)
2016/10/06 10:36:15 - Transformation -  at org.pentaho.di.job.entries.trans.JobEntryTrans.execute(JobEntryTrans.java:1097)
2016/10/06 10:36:15 - Transformation -  at org.pentaho.di.job.Job.execute(Job.java:723)
2016/10/06 10:36:15 - Transformation -  at org.pentaho.di.job.Job.execute(Job.java:864)
2016/10/06 10:36:15 - Transformation -  at org.pentaho.di.job.Job.execute(Job.java:608)
2016/10/06 10:36:15 - Transformation -  at org.pentaho.di.job.entries.job.JobEntryJobRunner.run(JobEntryJobRunner.java:69)
2016/10/06 10:36:15 - Transformation -  at java.lang.Thread.run(Thread.java:745)
2016/10/06 10:36:15 - slave_job - Finished job entry [Transformation] (result=[false])
2016/10/06 10:36:15 - master_job - Finished job entry [Job for each row] (result=[false])
2016/10/06 10:36:15 - master_job - Finished job entry [Transformation] (result=[false])
2016/10/06 10:36:15 - master_job - Job execution finished
2016/10/06 10:36:15 - Spoon - O Job finalizou.

Data from database_connection table is being read

2016/10/06 10:36:15 - set_variables - Expedindo in?cio para transforma??o [set_variables]
2016/10/06 10:36:15 - Table input.0 - Finished reading query, closing connection.
2016/10/06 10:36:15 - Copy rows to result.0 - Finished processing (I=0, O=0, R=6, W=6, U=0, E=0)
2016/10/06 10:36:15 - Table input.0 - Finished processing (I=6, O=0, R=0, W=6, U=0, E=0)

But I do not know what I'm doing wrong that these data are not passed as parameter.

I appreciate any help because I'm already stopped a few days ago this problem.

The examples I found here on stackoverflow and the pentaho forum did not help me much.

Project files (https://github.com/scarlosantos/pdi)

Thank you

2
As a first stab at it, I would re-configure the Job/Transform to write the contents of "Results" to a text file (or at least preview them) in order to see what's really in there. - Brian.D.Myers

2 Answers

0
votes

Use Set Variables step instead of copy results in your "set_variables.ktr", and use variables in your connection properties it will replace those variables at run-time and you will have dynamic db connection.

0
votes

This exact use case is nicely explained in FAQ Beginner Section.

To make it short:

0) Check you have all the drivers.

1) Do not forget to specify the names of these variables (right-click anywhere, Properties, Parameters) on the transformations and the job. And also that they are defined at job scope level.

2) IMPORTANT: you go to the View (on the left pane, you are most probably on Design), and share the connection so that PDI knows you connection in any transformation/job.

3) Edit the connection and in the HostName, DatabaseName,... boxes, you write ${HOST}, ${DATABASE_NAME},... or whatever name you gave to the variables. If you did step (1), just press Crtl-Space and select from the drop menu.

4) Edit with a notepad the file called C:\Users\yourname\.kettle\shared.xml. It's even fun if you keep a copy of the last working version. And, if you are brave enough, you can even produce this file with the PDI.

Now you are rising an interesting question: you seems to connect with the jdbc-url, which you can do in PDI (with the Generic Database Connection), however with that method the PDI does not know which sql-dialect you are using. So if you have some funny error along the flow, make sure you SELECT *, do NOT use lazy conversions and have a look at the types with a Right-click/Output Fields.