How to detect the namespaces in a foreign XML document using MSXML2?

Question

When working with XML using MSXML2, it's pretty well documented that for any XPath queries to work, you need to define the selectionNamespaces property. However, this is only a straightforward fix if you always know what the namespaces are. I'm writing a module in VBA that I hope to be able to use to parse all the Office XML formats, and I'd like a function that can arbitrarily define the namespaces for documents as I load them.

At present, I've found the following isn't a bad first stab:

Public DoDefineNamespaces(strRootNodeName As String, strFilePath As String, ByRef oMyDoc As MSXML2.DomDocument60)
    Dim oRootNode As MSXML2.IXMLDomNode
    Dim oMyDoc As MSXML2.DomDocument60
    Dim oAttribute As MSXML2.IXMLDomNode
    Sim strNamespaces As String

    Set oMyDoc = New MSXML2.DomDocument60
    oMyDoc.Load strFilePath
    Set oRootNode = oMyDoc.SelectNodes("./*[name()='" & strRootNodeName & "']")
    For Each oAttribute In oRootNode.Attributes
        If oAttribute.Namespace = "http://www.w3.org/2000/xmlns/" Then 
            strNamespaces = strNamespaces & oAttribute.Xml
        End If
    Next oAttribute
    oMyDoc.SetProperty("SelectionNamespaces", strNamespaces)
End Sub

With a couple of subtle changes for dealing with the default namespace. However, this won't work very fell with XML like the following:

<?xml>
<root xmlns:t="MyFirstNS">
   <t:object1>
       <r:object2 xmlns:r="MySecondNS" />
   </t:object1>
</root>
</xml>

Aside from traversing, is there an approach that might be better than mine for dealing with this kind issue = i.e. any namespace not defined in the root node? Ideal would be an XPATH 1.0 expression that will select all xmlns attribute nodes even when the namespace they exist in has yet to be added to SelectionNamespaces, or help building an XSLT transform that will produce a nodeset with the namespaces of a document.

What does that mean: "that will select all xmlns attribute nodes even when the namespace they exist in has yet"? — Mathias Müller
That's what I get for starting writing the question and then getting interrupted! Corrected to make sense - full sentence was supposed to finishe "... yet to be added to SelectionNamespaces". — tobriand
What do you plan to do with the namespaces once you have them? If these are truly unknown namespaces, then I'm not sure how you will use them in queries. — John Saunders
It's not so much that they're unknown as that they should be unknown as far as the module is concerned. The idea is that the same code could be used in principle with Excel, Powerpoint, Word, Visio, etc. etc. files, each of which might use subtly different namespaces, and only one of which is relevant now. — tobriand

Jens Erat Jens Erat · Accepted Answer · 2014-03-21T15:46:10

To determine all namespaces used in the document, use this XPath 1.0 query:

/*/namespace::*

This will include duplicates.

For XPath 2.0, you'd go for

distinct-values(//*/fn:namespace-uri())

instead as the namespace axis is deprecated. Both only return namespaces actually in use (and omits the ones which are not used). This query already removes the duplicate namespaces.

Anyway: if you don't care about the namespaces, it might be more reasonable to just ignore them. In XPath 1.0 you have to use the wildcard axis step and do a name test in a predicate. To match all elements <foo/> with arbitrary namespaces, use //*[local-name() = 'foo'] respectively //*:foo in XPath 2.0.

How to detect the namespaces in a foreign XML document using MSXML2?

1 Answers