0
votes

I have a simple task of copying Excel data to SQL tables. I am executing one stored procedure initially to delete tables entries. Then I have Excel input from which I am copying data to the SQL tables using tMap.

I have 20 tables to copy data to. I have relatively small number of table entries (10-100) to copy. Still when I am executing my task, it takes a very long time (5-10 mins) and after copying 12 tables entries its running out of memory.

My work flow is.. (stored procedure ->(on subjob ok) -> excel input -> tmap -> tMSSqlOutput -> (on component ok) -> excel input -> tmap -> tMSSqlOutput (on component ok) - > ...... -> excel input -> tmap -> tMSSqlOutput)

My Excel sheet is on my local machine where as I am copying data to SQL tables on a server. I have kept my run/debug settings as Xms 1024M, Xmx 8192m. But still its not working.

May I know what can I do to solve this issue?

I am running my talend on a VM (Virtual Machine). I have attached the screenshot of my job.

enter image description here

3
Can you post a screenshot of your job please? - ydaetskcoR
I have edited it in my question. - Quick-gun Morgan

3 Answers

2
votes

Use onSubJobOK on the excelInput to connect to the next ExcelInput. This would change the whole codegeneration.

The Generated code is a function for every subjob. The difference in code generation between onSubJob and onComponentOk is that OnComponent ok will call the next function, while OnSubJobOk waits for the current subjob/function to finish. The latter let the Garbage Collerctor function better.

If that doesn't solve the problem create subjobs which contain 1 excel-DBoutput. Then link these jobs with OnSubjobOK in a master job.

2
votes

You should be running all of these separate steps in separate subjobs, using "on subjob ok" to link them, so that the Java garbage collector can better reallocate memory between steps.

If this still doesn't work you could separate them into completely separate jobs and link them all using tRunJob components and make sure to select to tick "Use an independent process to run subjob":

Use an independent process to run subjob option in tRunJob

This will spawn a completely new JVM instance for the process and thus not be memory tied by the JVM. That said, you should be careful not to spawn too many JVM instances as there will be some overhead in the start up of the JVM and obviously you are still limited by any physical memory constraints.

It belongs in a separate question really but you may also find some benefit to using parallelisation in your job to improve performance.

-1
votes

To avoid consuming too much memory by the job (outOfMemory), you can store large transformed data in your tmap in a temporary directory on the disk.

This printscreen shows how to do that.

enter image description here