To summarize, I have an application that takes a set of input files, produces a tree from the data, and then writes it out to a text file as XML.
Currently, the entire tree is stored in memory before it is written out because during parsing we need to reference arbitrary nodes on the tree to either fetch or update its values.
The problem we are facing occurs when the tree becomes too big to store all of it in memory. The tree itself is very flat with only 4-6 levels of depth. It looks something like this
Root
Group
Record
Data
Data
Record
Data
Data
Data
...
...
Group
Record
...
There will always be one Root
node, and each node only has one type of child. However, there is no order to the way nodes are added to other nodes either: depending on how the data is formatted, you might add records to different groups, and you might add data to different records (as opposed to building one record for one group, and then moving on to another)
My first suggestion was to just throw more memory at our machine. We're running the tool on a 64-bit windows machine, so if we're running out of memory, then we just need to get more memory. But that suggestion wasn't taken.
The next idea I had was to write out nodes whenever the tree was taking up too much space in memory, but because data can be added to a particular record at any time, it becomes difficult to determine when we are actually done with a record or not. Especially if we need to refer back to a record, and it's already been written out.
There are several others options such as optimizing the way the tree is designed (since each node takes up a fairly large amount of memory), but for this question I would like to know techniques for building and exporting large trees.