We use SAX when:
We are damn sure that only a single pass over the file will suffice. which by the way does for most of the times. a code which does multi pass or takes pointer back/forward can most of the times be refactored to work in one pass.
When we are receiving the xml file through some streaming channel, like over network for example, and we want to do real time readout possibly even before the whole file has completely downloaded. SAX can work with partially downloaded files, DOM cannot.
When we are interested in a particular locality within the XML, not in complete document. for example an Atom Feed works best with SAX, but to analyze a WSDL you will need a DOM.
We use DOM when:
Well when single pass will not do. we need to go up and down in the file.
when the XML is on disk and we dont need real-time readouts. we can take our time, load
it, read it, analyze it, then come to conclusion.
When your boss asks to do it before lunch and you dont bother the quality.
now to answer your question
you provided with:
- you have a huge file : ........SAX +1
- to parse multiple times : .....DOM +1
both get equal votes. Add to it your existing knowledge base. (Familiar with SAX?). How huge is huge? Both of your XML and memory you said is huge. even a 100MB file is not a big deal. DOM can handle it. You need to parse multiple times each day. if one operation takes within a couple of minutes, then retaining the data in memory for next few hours doesnt seem wise. in that case you loose benefit of DOM. but if one operation itself takes say an hour then you are damn right to retain the pre-processed information.
As i noted you didnt provide enough stats. take the stats on data size, memory size, time-to-load in-DOM, processing time, exactly how many times a day do you need it again? what does your machien do in meantime? sit idle or analyzes other such files?
takes these stats. either post it here or just analyze them yourself and you will reach a conclusion.