ColdFusion memory spike when reading csv file for DB import

Question

We have a ColdFusion 9 script that runs regularly reading a CSV file and inserting the records into an Oracle 11g database. The CSV file has around 50 columns, 8 of which are used by CF (the CSV format cannot be modified). The general flow of the CFM is:

Read file into variable
CFLOOP using that variable as the list attribute with CHR(13) as a delimiter
Call an Oracle stored procedure inserting various values from the file using ListGetAt

The stored procedure does the following:

Inserts a record with 2 fields into table 1
Inserts a record with 8 fields (including table 1's primary key) into table 2
Returns nothing

This runs successfully in most cases, reading 400 KB files with hundreds of records in just a couple seconds. However, occasionally we get a large volume and end up with a 13k record 5MB file. When we try to process a file this large I watch the JVM memory usage go from 90MB up to around 680MB over a 10-15 second period, after which the CF Server Monitor stops responding (as does CF) forcing us to restart the service. Logs report a JVM out-of-memory error:

"Error","qtp4795249-38798","12/28/11","16:29:20",,"GC overhead limit exceeded" java.lang.OutOfMemoryError: GC overhead limit exceeded

Our JVM heap size is currently 768MB. I haven't tried increasing it as even if that does resolve this issue it doesn't protect us in the future and the rest of the server's normal load doesn't require nearly that much. And I hesitate to play too much with JVM settings that require a restart to go into effect on a production box.

This is difficult to test as the import process runs fine with barely any noticeable memory load on my local dev machine and on our QA box, but both of those have a much slower connection to the database and take 10-15 minutes to complete.

I would appreciate any thoughts, especially about where the memory is going. I cannot figure out how 5MB of data gets turned into 700 MB of data. We do have debugging information turned on, but the calling script's IP address is not in the debugging list and I used a cfsetting tag to turn debugging off for this page. Previously there was a step 1.5 that turned the CSV data into a ColdFusion query, but I eliminated that in an attempt to improve efficiency. Both ways result in an oom error.

Thanks for all the responses. Due to month-end and year-end processing occuring with the customer, and their response to taking down the CF server for 30 seconds yesterday, we will be waiting a week or so before trying to implement or test any of these suggestions. I will update this question then. — Nicholas

Sergey Galashyn Sergey Galashyn · Accepted Answer · 2011-12-29T20:32:25

Have you considered direct import to database? For MySQL this is LOAD DATA INFILE, for SQL Server it is BULK INSERT. If you need some additional processing than possible approach would be loading data to the temporary table and then handling it with CFML, which can easily be done in batches for heavy processing.

ColdFusion memory spike when reading csv file for DB import

4 Answers