0
votes

I have a basic Pentaho transformation in my job that reads 5,000 records from a stored procedure in SQL Server via a 'Table Input' step. This data has 5 columns one of which is an XML column. After the 'Table Input' a 'Text File Output' step is run which takes the path to save from one of the columns and the xml data to save as the only field provided in the fields tab. This then creates 5,000 XML files in the given location by streaming data from the 'Table Input' to 'Text File Output'.

When this job is executed it runs at 99-100% CPU utilization for the duration of the job and then drops back down to ~5-10% CPU utilization afterwards. Is there any way to control the CPU utilization either through Pentaho or command prompt? This is running on a Windows Server 2012 R2 machine with 4GB of RAM with a Intel Xeon CPU E5-2680 v2 @ 2.8 GHz processor. I have seen that the memory usage can get controlled through Spoon.bat but haven't found anything online about controlling CPU usage.

1
Thanks for the link but only half of the problem is from the SQL Server side of things, reading the data. The other half is happening in Pentaho to write the XML files.Jeff Fol

1 Answers

0
votes

In my experience, neither of those steps is CPU intensive under normal circumstances. Two causes I can think of are:

It's choking trying to format the XML. That would be fixed by checking the options Lazy conversion in the Table input step and Fast data dump (no formatting) in the text file output step. Then it should just stream the string data through.

The other is that you have huge XMLs and the CPU usage is actually garbage collection because Pentaho is running out of memory. Test this by increasing the maximum heap space (the -Xmx1024m option in the startup script.)