2
votes

I am trying to run a PDI transformation involving database (any database, but noSQL one are more preferred) from Java.

I've tried using mongodb and cassandradb and got missing plugins, I've already asked here: Running PDI Kettle on Java - Mongodb Step Missing Plugins, but no one replied yet.

I've tried switching to SQL DB using PostgreSQL too, but it still doesn't work. From the research I did, I think it was because I didn't connect the database from the Java thoroughly, yet I haven't found any tutorial or direction that works for me. I've tried following directions from this blog : http://ameethpaatil.blogspot.co.id/2010/11/pentaho-data-integration-java-maven.html : but still got some problems about repository (because I don't have any and there seems to be required).

The transformations are fine when I run it from Spoon. It only failed when I run it from Java.

Can anyone help me how to run PDI transformation involving database? Where did I go wrong?

Is anyone ever succeeded in running PDI transformation from involving either noSQL and SQL database? what DB did you use?

I'm sorry if I asked too many questions, I am so desperate. any kind of information will be very appreciated. Thank you.

4
What kind of error if u run from Java (provide stack-trace) ? Are u sure that postgres host reachable and postgresql is up and running? Postgresql jdbc driver is available (in classpath of u java programm) ? Are u sure version matches between postgresql driver, jvm, postgres it self?simar
Yes I am sure, I tried executing a simple query and it works fine. I'm not sure about the version though, how do I check it? the error I got is "failed to initialize table output"M.R. Murazza
Here jdbc.postgresql.org/download.html u can find table with all jdbc drivers and information about compatibility jdk version, postgres version. Pentaho latest JDK7 compatible.simar

4 Answers

3
votes

Executing PDI Jobs from Java is pretty straight forward. You just need to import all the necessary jar files (for the databases) and then call in the kettle class. The best way is obviously to use "Maven" to control the dependency. In the maven pom.xml file, just call the database drivers.

A Sample Maven file would be something like below, assuming you are using pentaho v5.0.0GA and Database as PostgreSQL:

<dependencies>
    <!-- Pentaho Kettle Core dependencies development -->
    <dependency>
        <groupId>pentaho-kettle</groupId>
        <artifactId>kettle-core</artifactId>
        <version>5.0.0.1</version>
    </dependency>
    <dependency>
        <groupId>pentaho-kettle</groupId>
        <artifactId>kettle-dbdialog</artifactId>
        <version>5.0.0.1</version>
    </dependency>
    <dependency>
        <groupId>pentaho-kettle</groupId>
        <artifactId>kettle-engine</artifactId>
        <version>5.0.0.1</version>
    </dependency>
    <dependency>
        <groupId>pentaho-kettle</groupId>
        <artifactId>kettle-ui-swt</artifactId>
        <version>5.0.0.1</version>
    </dependency>
    <dependency>
        <groupId>pentaho-kettle</groupId>
        <artifactId>kettle5-log4j-plugin</artifactId>
        <version>5.0.0.1</version>
    </dependency>

    <!-- The database dependency files. Use it if your kettle file involves database connectivity. -->
    <dependency>
        <groupId>postgresql</groupId>
        <artifactId>postgresql</artifactId>
        <version>9.1-902.jdbc4</version>
    </dependency>

You can check my blog for more. It works for database connections.

Hope this helps :)

1
votes

I had the same problem in a application using the pentaho libraries. I resolved the problem with this code:

The singleton to init Kettle:

import org.pentaho.di.core.KettleEnvironment;
import org.pentaho.di.core.exception.KettleException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
 * Inicia as configurações das variáveis de ambiente do kettle
 * 
 * @author Marcos Souza
 * @version 1.0
 *
 */
public class AtomInitKettle {

    private static final Logger LOGGER = LoggerFactory.getLogger(AtomInitKettle.class);

    private AtomInitKettle() throws KettleException {
        try {
            LOGGER.info("Iniciando kettle");
            KettleJNDI.protectSystemProperty();
            KettleEnvironment.init();
            LOGGER.info("Kettle iniciado com sucesso");
        } catch (Exception e) {
            LOGGER.error("Message: {} Cause {} ", e.getMessage(), e.getCause());
        }
    }
}

And the code that saved me:

import java.io.File;
import java.util.Properties;

import org.pentaho.di.core.Const;
import org.pentaho.di.core.exception.KettleException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class KettleJNDI {

    private static final Logger LOGGER = LoggerFactory.getLogger(KettleJNDI.class);

    public static final String SYS_PROP_IC = "java.naming.factory.initial";

    private static boolean init = false;

    private KettleJNDI() {

    }

    public static void initJNDI() throws KettleException {
        String path = Const.JNDI_DIRECTORY;
        LOGGER.info("Kettle Const.JNDI_DIRECTORY= {}", path);

        if (path == null || path.equals("")) {
            try {
                File file = new File("simple-jndi");
                path = file.getCanonicalPath();
            } catch (Exception e) {
                throw new KettleException("Error initializing JNDI", e);
            }
            Const.JNDI_DIRECTORY = path;
            LOGGER.info("Kettle null > Const.JNDI_DIRECTORY= {}", path);
        }

        System.setProperty("java.naming.factory.initial", "org.osjava.sj.SimpleContextFactory");
        System.setProperty("org.osjava.sj.root", path);
        System.setProperty("org.osjava.sj.delimiter", "/");
    }

    public static void protectSystemProperty() {
        if (init) {
            return;
        }

        System.setProperties(new ProtectionProperties(SYS_PROP_IC, System.getProperties()));

        if (LOGGER.isInfoEnabled()) {
            LOGGER.info("Kettle System Property Protector: System.properties replaced by custom properies handler");
        }

        init = true;
    }

    public static class ProtectionProperties extends Properties {

        private static final long serialVersionUID = 1L;
        private final String protectedKey;

        public ProtectionProperties(String protectedKey, Properties prprts) {
            super(prprts);
            if (protectedKey == null) {
                throw new IllegalArgumentException("Properties protection was provided a null key");
            }
            this.protectedKey = protectedKey;
        }

        @Override
        public synchronized Object setProperty(String key, String value) {
            // We forbid changes in general, but do it silent ...
            if (protectedKey.equals(key)) {
                if (LOGGER.isDebugEnabled()) {
                    LOGGER.debug("Kettle System Property Protector: Protected change to '" + key + "' with value '" + value + "'");
                }

                return super.getProperty(protectedKey);
            }

            return super.setProperty(key, value);
        }
    }
}
1
votes

I think your problem is with connection of data base. You can configure in transformation and do not need use JNDI.

public class DatabaseMetaStep {

    private static final Logger LOGGER = LoggerFactory.getLogger(DatabaseMetaStep.class);

     /**
     * Adds the configurations of access to the database
     * 
     * @return
     */
    public static DatabaseMeta createDatabaseMeta() {
        DatabaseMeta databaseMeta = new DatabaseMeta();

        LOGGER.info("Carregando informacoes de acesso");
        databaseMeta.setHostname("localhost");
        databaseMeta.setName("stepName");
        databaseMeta.setUsername("user");
        databaseMeta.setPassword("password");
        databaseMeta.setDBPort("port");
        databaseMeta.setDBName("database");     
        databaseMeta.setDatabaseType("MonetDB"); // sql, MySql ...
        databaseMeta.setAccessType(DatabaseMeta.TYPE_ACCESS_NATIVE);

        return databaseMeta;
    }
}

Then you need set the databaseMeta to Transmeta

DatabaseMeta databaseMeta = DatabaseMetaStep.createDatabaseMeta();

        TransMeta transMeta = new TransMeta();
        transMeta.setUsingUniqueConnections(true);
        transMeta.setName("ransmetaNeame");

        List<DatabaseMeta> databases = new ArrayList<>();
        databases.add(databaseMeta);
        transMeta.setDatabases(databases);
1
votes
  • I tried your code with a "tranformation without jndi" and works!

But I needed add this repository in my pom.xml:

<repositories>
    <repository>
        <id>pentaho-releases</id>
        <url>http://repository.pentaho.org/artifactory/repo/</url>
    </repository>
</repositories>
  • Also when I try with a datasource I have this error : Cannot instantiate class: org.osjava.sj.SimpleContextFactory [Root exception is java.lang.ClassNotFoundException: org.osjava.sj.SimpleContextFactory]

Complete log here: https://gist.github.com/eb15f8545e3382351e20.git

[FIX] : Add this dependency :

<dependency>
    <groupId>pentaho</groupId>
    <artifactId>simple-jndi</artifactId>
    <version>1.0.1</version>
</dependency>
  • After that a new error occurs:

    transformation_with_jndi - Dispatching started for transformation [transformation_with_jndi] Table input.0 - ERROR (version 5.0.0.1.19046, build 1 from 2013-09-11_13-51-13 by buildguy) : An error occurred, processing will be stopped: Table input.0 - Error occured while trying to connect to the database Table input.0 - java.io.File parameter must be a directory. [D:\opt\workspace-eclipse\invoke-ktr-jndi\simple-jndi]

Complete log : https://gist.github.com/jrichardsz/9d74c7263f3567ac4b45

[EXPLANATION] This is due to in

KettleEnvironment.init(); 

https://github.com/jrichardsz/pentaho-pdi-spoon-usefull-templates/blob/master/running-etl-transformation-using-java/researching-pentaho-classes/KettleEnvironment.java

There is a inicialization :

        if (simpleJndi) {
          JndiUtil.initJNDI();
}

And in JndiUtil:

String path = Const.JNDI_DIRECTORY;
if ((path == null) || (path.equals("")))

https://github.com/jrichardsz/pentaho-pdi-spoon-usefull-templates/blob/master/running-etl-transformation-using-java/researching-pentaho-classes/JndiUtil.java

And in Const class :

public static String JNDI_DIRECTORY = NVL(System.getProperty("KETTLE_JNDI_ROOT"), System.getProperty("org.osjava.sj.root"));

https://github.com/jrichardsz/pentaho-pdi-spoon-usefull-templates/blob/master/running-etl-transformation-using-java/researching-pentaho-classes/Const.java

So wee need set this variable KETTLE_JNDI_ROOT

[FIX] A small change in your example : Just add this

System.setProperty("KETTLE_JNDI_ROOT", jdbcPropertiesPath);

before

KettleEnvironment.init();

A complete example based in your code :

import java.io.File;
import org.pentaho.di.core.KettleEnvironment;
import org.pentaho.di.core.exception.KettleException;
import org.pentaho.di.trans.Trans;
import org.pentaho.di.trans.TransMeta;

public class ExecuteSimpleTransformationWithJndiDatasource {    

    public static void main(String[] args) {

        String resourcesPath = (new File(".").getAbsolutePath())+"\\src\\main\\resources";
        String ktr_path = resourcesPath+"\\transformation_with_jndi.ktr";

        //KETTLE_JNDI_ROOT could be the simple-jndi folder in your pdi or spoon home.
        //in this example, is the resources folder
        String jdbcPropertiesPath = resourcesPath;

        try {
            /**
             * Initialize the Kettle Enviornment
             */
            System.setProperty("KETTLE_JNDI_ROOT", jdbcPropertiesPath);
            KettleEnvironment.init();

            /**
             * Create a trans object to properly assign the ktr metadata.
             * 
             * @filedb: The ktr file path to be executed.
             * 
             */
            TransMeta metadata = new TransMeta(ktr_path);
            Trans trans = new Trans(metadata);

            // Execute the transformation
            trans.execute(null);
            trans.waitUntilFinished();

            // checking for errors
            if (trans.getErrors() > 0) {
                System.out.println("Erroruting Transformation");
            }

        } catch (KettleException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

    }

}

For a complete example check my github channel:

https://github.com/jrichardsz/pentaho-pdi-spoon-usefull-templates/tree/master/running-etl-transformation-using-java/invoke-transformation-from-java-jndi/src/main/resources