Good day,
I have a kettle pentaho
file that run as a batch job.
Basically, this files contain of 2 main steps,
First step, read from a input file (txt
file) and store inside table1
.
Second step, same as first step, read from same input file and store inside table2
.
This batch is working fine until I put in a 20MB input file.It require more than 7hours to finish the job.
Below is some test case I have done:
15360 records, 1.4MB, 2 minutes and 20 seconds (140 seconds total).
30720 records, 2.8MB , 7 minutes and 30 seconds (450 seconds total)
61440 records, 5.5MB, 26 minutes and 55 seconds (1615 seconds total).
250000 records, 20MB, 7 hours and 30 minutes
In the log, I found there is some steps that occupied most of the time consuming. Which are as follow: 1. Text file input. 2. Select values. 3. Modified Java Script Value.
Both main steps also contain this 3 kettle pentaho function. For 20MB input file, first step only take around 7 minutes, but second step take more than 7 hours.
Try to look at it in quite long time, still cant find out what is the problem.
Kindly advise.