We have a ColdFusion 9 script that runs regularly reading a CSV file and inserting the records into an Oracle 11g database. The CSV file has around 50 columns, 8 of which are used by CF (the CSV format cannot be modified). The general flow of the CFM is:
- Read file into variable
- CFLOOP using that variable as the list attribute with CHR(13) as a delimiter
- Call an Oracle stored procedure inserting various values from the file using ListGetAt
The stored procedure does the following:
- Inserts a record with 2 fields into table 1
- Inserts a record with 8 fields (including table 1's primary key) into table 2
- Returns nothing
This runs successfully in most cases, reading 400 KB files with hundreds of records in just a couple seconds. However, occasionally we get a large volume and end up with a 13k record 5MB file. When we try to process a file this large I watch the JVM memory usage go from 90MB up to around 680MB over a 10-15 second period, after which the CF Server Monitor stops responding (as does CF) forcing us to restart the service. Logs report a JVM out-of-memory error:
"Error","qtp4795249-38798","12/28/11","16:29:20",,"GC overhead limit exceeded" java.lang.OutOfMemoryError: GC overhead limit exceeded
Our JVM heap size is currently 768MB. I haven't tried increasing it as even if that does resolve this issue it doesn't protect us in the future and the rest of the server's normal load doesn't require nearly that much. And I hesitate to play too much with JVM settings that require a restart to go into effect on a production box.
This is difficult to test as the import process runs fine with barely any noticeable memory load on my local dev machine and on our QA box, but both of those have a much slower connection to the database and take 10-15 minutes to complete.
I would appreciate any thoughts, especially about where the memory is going. I cannot figure out how 5MB of data gets turned into 700 MB of data. We do have debugging information turned on, but the calling script's IP address is not in the debugging list and I used a cfsetting tag to turn debugging off for this page. Previously there was a step 1.5 that turned the CSV data into a ColdFusion query, but I eliminated that in an attempt to improve efficiency. Both ways result in an oom error.