Fast clear MSXML Document or Re-create?

Question

Is there a fast way to clear the previous content of an MSXML2.DOMDocument object prior to reuse? I've been in the habit of discarding them and creating a fresh instance each time but this strikes me as wasteful and profiling a few test cases seems to confirm this.

I'm sticking with MSXML 3.0 in this case for portability, and I realize this older version has some quirks when it comes to using XPath to select large sets of nodes. Trying to select the whole document tree and then removing it doesn't feel clean and doesn't run as fast as I'd like. The "lazy selection" MSXML 3.0 uses doesn't inspire confidence either:

selectNodes Method

Previously, in MSXML 3.0 and earlier versions, the selection object created by calling the selectNodes method would gradually calculate the node-set. If the DOM tree was modified, while the selectNodes call was still actively iterating its contents, the behavior could potentially change the nodes that were selected or returned. In MSXML 4.0 and later, the node-set result is fully calculated at the time of selection. This ensures that the iteration is simple and predictable. In rare instances, this change might impact legacy code written to accommodate previous behavior.

I also realize that reusing such an object requires being mindful of the current settings of different properties (SelectionLanguage, etc.) that might linger between uses. I'd think that shouldn't be a big deal though, especially if the reusage always follows the same pattern.

I suppose what I'm after then is some clean and fast way to clear the loaded DOM to reuse it, or more input as to why reuse might be worse than the alternative of recreation.

I'm no MSXML whiz, but have you tried calling Document->putref_documentElement (with a newly constructed, empty root element) or calling Document->load (with a pointer to a different XML source)? — reuben
Loading won't help becase I'm constructing the Document in code, but the other idea is worth trying. Thanks! Almost seems obvious, but maybe I haven't tried it. — Bob77
Replacing the root element seems to do the trick. Too bad this wasn't suggested as an answer, I'd accept it. — Bob77

Samuel Zhang Samuel Zhang · Accepted Answer · 2011-08-05T15:55:53

You may consider migrating to MSXML6:

First of all, MSXML6 is in-the-box with WinXP SP3, Vista, Windows Server 2008, Win7 and Windows Server 2008 R2. The only OS supported by Microsoft that doesn't have MSXML6 in band is Windows 2003, where you'll have to let customer to download the MSI. Overall, MSXML6 is almost as portable as MSXML3.
Unlike MSXML3 supporting both XSL Pattern and XPath, MSXML6 supports XPath only, where SelectNodes and SelectSingleNode only work in the context of snapshot.
Unlike GetElementsByTagName, a snapshot semantics is a defined by W3C. MSXML6 has better performance and W3C compliance.

Also, you shouldn't care too much about cleaning up the document after each use, as MSXML has Garbage Collection internally, meaning you'll not get the memory back when you replacing the document element. My advice is to have peace with a specific cleansing effort, just reuse the instance for the next load or rebuilding the tree with DOM API. If memory usage is really a big concern, XmlLite can give you full control.

Fast clear MSXML Document or Re-create?

1 Answers