1
votes

I have a Pentaho ETL Job/Transformation that reads a text file and inserts some records into a MS SQL Database table. I execute it daily. It take up to 10 minutes to finish. The problem happens when someone else executes it: the time rises up to 40 minutes. All the exections happens on the same machine, with the same JRE version. The logs don't show nothing unusual, just a greater time lapse between the steps.

System info:

  • Windows 8 Enterprise 64bits
  • JRE 1.7_79 32 bits
  • Pentaho 5.3.0
  • MS SQL 2000 (8.0)

Called command:

C:\SR\bin\data-integration>"C:\SR\bin\jre1.7.0_79\bin\java.exe"  "-Xmx512m" "-XX:MaxPermSize=256m" "-Djava.library.path=libswt\win32" "-DKETTLE_HOME=" "-DKETTLE_REPOSITORY=" "-DKETTLE_USER=" "-DKETTLE_PASSWORD=" "-DKETTLE_PLUGIN_PACKAGES=" "-DKETTLE_LOG_SIZE_LIMIT=" "-DKETTLE_JNDI_ROOT=" -jar launcher\pentaho-application-launcher-5.3.0.0-213.jar -lib ..\libswt\win32  -main org.pentaho.di.kitchen.Kitchen /file C:\SR\config\pentaho\visao.kjb /param:"dia=29" /param:"mes=09" /param:"ano=2016" /param:"arquivo=Realize2016" /param:"dia_util=28" /norep 

My log:

2016/09/27 11:26:03 - Reading of file MyFile.0 - Line number : 50000
2016/09/27 11:26:03 - Validate Records.0 - Linenr 50000
2016/09/27 11:26:03 - Discarded records.0 - Linenr 50000
2016/09/27 11:26:04 - Reading of file MyFile.0 - Line number : 100000
2016/09/27 11:26:04 - Validate Records.0 - Linenr 100000
2016/09/27 11:26:04 - Discarded records.0 - Linenr 100000
2016/09/27 11:26:05 - Reading of file MyFile.0 - Line number : 150000
2016/09/27 11:26:05 - Validate Records.0 - Linenr 150000
2016/09/27 11:26:05 - Discarded records.0 - Linenr 150000
2016/09/27 11:26:06 - Reading of file MyFile.0 - Line number : 200000
2016/09/27 11:26:06 - Validate Records.0 - Linenr 200000
2016/09/27 11:26:06 - Discarded records.0 - Linenr 200000
2016/09/27 11:26:07 - Reading of file MyFile.0 - Line number : 250000
2016/09/27 11:26:07 - Validate Records.0 - Linenr 250000
2016/09/27 11:26:08 - Discarded records.0 - Linenr 250000

My colleague log:

2016/09/29 10:13:26 - Reading of file MyFile.0 - Line number : 50000
2016/09/29 10:13:32 - Validate Records.0 - Linenr 50000
2016/09/29 10:13:32 - Discarded records.0 - Linenr 50000
2016/09/29 10:13:40 - Reading of file MyFile.0 - Line number : 100000
2016/09/29 10:13:46 - Validate Records.0 - Linenr 100000
2016/09/29 10:13:47 - Discarded records.0 - Linenr 100000
2016/09/29 10:13:56 - Reading of file MyFile.0 - Line number : 150000
2016/09/29 10:14:01 - Validate Records.0 - Linenr 150000
2016/09/29 10:14:02 - Discarded records.0 - Linenr 150000
2016/09/29 10:14:10 - Reading of file MyFile.0 - Line number : 200000
2016/09/29 10:14:17 - Validate Records.0 - Linenr 200000
2016/09/29 10:14:18 - Discarded records.0 - Linenr 200000
2016/09/29 10:14:26 - Reading of file MyFile.0 - Line number : 250000
2016/09/29 10:14:31 - Validate Records.0 - Linenr 250000
2016/09/29 10:14:32 - Discarded records.0 - Linenr 250000
2

2 Answers

1
votes

There's got to be something different. Are you executing on the same account? In what env? Windows or Linux?

Have you tried executing using pan/kitchen? Perhaps it will standardize your environments?

If you upload the trasformation etc I'll take a look.

1
votes

I finally found what was causing the poor performance during the execution of the Job by my co-worker.

After comparing all the enviroments variables and configurations, his profile was missing some Kettle/Pentaho config files. These files were created by Spoon and my colleague has never executed Spoon. He only run the Job using Kitchen.

The files were created in %USERPROFILE%\.kettle\

Another difference between my profile and his, was the default location set on Spoon. Mine was set to en-US, while his was using the system default (pt-BR).

After all settings were the same between the profiles, the execution time decreased significantly: from 40 minutes (average) to 6 minutes (average).