2
votes

I have the ETL process written using the Kettle. It performs data transfer from the operational data source (MS SQL on Windows) to the Data Warehouse (MySQL on Ubuntu).

enter image description here

I want to schedule the Kettle job(other) for daily execution for populating tables of dimensions and table of fact, to have my data actual and ready for analysis and reporting.

How can I schedule performing the Kettle jobs?

2

2 Answers

3
votes

in your kettle installation directory are several batch files, among them spoon.bat, pan.bat and kitchen.bat. Spoon is the UI you already know, pan is a command line tool to run transformations (.ktr files) from the command line and kitchen is a command line tool to run kettle jobs (.kjb files).

for a simple schedule create a batch file that calls either kitchen.bat or pan.bat (depending on whether you want to run a transformation or a job). Then use the windows task scheduler to run your batch file with whichever schedule you want.

this for instance would run a kettle job, use basic logging and append the log content to a logfile

kitchen.bat /file:"c:\etl\my_first_job.kjb" /level:Basic > c:\etl\logs\logging_for_my_first_job.log

this is of course for windows. If you run kettle on linux, you can use cron and the respective .sh files in the kettle installation directory (pan.sh or kitchen.sh).

as kettle stores shared database connections in the user profile, make sure the user running the scheduled task has those connections in his profile, otherwise your transformations would fail.

1
votes

scheduling in pentaho is done by carte server. http://wiki.pentaho.com/display/EAI/Carte+User+Documentation

using with your start step params scheduling and the carte server you will be able to schedule this kettle job when you want.