faster multiple parsings : SAX or DOM

Question

I read many posts that SAX is faster than DOM. I am not sure if my question is silly but i think DOM must be faster if we have huge memory.Cause once the tree structure is loaded into memory then it should be faster than SAX.

I need some clarifications here, please help me in understanding. I have a use case where i receive a huge file to parse multiple times everyday. Can i say DOM might be bit slower than SAX while parsing for the first time, and all subsequent parsings will be tremendously faster in case of DOM as it loads the entire document structure in memory and reuses it. If so , then how can we say that SAX is faster than DOM .Please correct me if i am wrong. And if tomorrow i change my XSD and need to push the new structure into memory then is there any way to do it without restarting the application.

inquisitive inquisitive · Accepted Answer · 2013-08-31T05:15:40

We use SAX when:

We are damn sure that only a single pass over the file will suffice. which by the way does for most of the times. a code which does multi pass or takes pointer back/forward can most of the times be refactored to work in one pass.
When we are receiving the xml file through some streaming channel, like over network for example, and we want to do real time readout possibly even before the whole file has completely downloaded. SAX can work with partially downloaded files, DOM cannot.
When we are interested in a particular locality within the XML, not in complete document. for example an Atom Feed works best with SAX, but to analyze a WSDL you will need a DOM.

We use DOM when:

Well when single pass will not do. we need to go up and down in the file.
when the XML is on disk and we dont need real-time readouts. we can take our time, load it, read it, analyze it, then come to conclusion.
When your boss asks to do it before lunch and you dont bother the quality.

now to answer your question

you provided with:

you have a huge file : ........SAX +1
to parse multiple times : .....DOM +1

both get equal votes. Add to it your existing knowledge base. (Familiar with SAX?). How huge is huge? Both of your XML and memory you said is huge. even a 100MB file is not a big deal. DOM can handle it. You need to parse multiple times each day. if one operation takes within a couple of minutes, then retaining the data in memory for next few hours doesnt seem wise. in that case you loose benefit of DOM. but if one operation itself takes say an hour then you are damn right to retain the pre-processed information.

As i noted you didnt provide enough stats. take the stats on data size, memory size, time-to-load in-DOM, processing time, exactly how many times a day do you need it again? what does your machien do in meantime? sit idle or analyzes other such files?

takes these stats. either post it here or just analyze them yourself and you will reach a conclusion.

faster multiple parsings : SAX or DOM

1 Answers

now to answer your question