1
votes

Given the following XML (fragment):

<node id="b071f9fa-14b0-4217-8e97-eb41da73f598" type="Group" ext:score="90">
<node id="b071f9fa-14b0-4217-8e97-eb41da73f599" type="Person" ext:score="100">
<node id="b071f9fa-14b0-4217-8e97-eb41da73f600" type="Business" ext:score="80">

I want to retrieve the id of nodes that have an ext:score of 100.

The current code:

match = dom.xpath('//node[@ext:score="100"]/@id')[0]

Returns an exception:

lxml.etree.XPathEvalError: Undefined namespace prefix

I have read (both here and in XPath docs) that ext would first need to be defined as a valid namespace, as the DOM cannot be parsed as an attribute if it contains special characters. However, I have been unable to find a good example of how to do this. There is no definition of ext in the excerpts I am processing and I'm not sure how to create a namespace prefix.

Any thoughts?

1
I've read that @kjhughes, and I understand how to create a namespace, but I don't see how I can then use that namespace prefix to test for a condition. Still looking... Thanks!Jasper33
Does your XML have a namespace declaration for ext -- something like xmlns:ext="http://example.com/extention" on an element above the node elements?kjhughes
@kjhughes - I don't (these come to me as-is,) but I've been told that the original contains this: <metadata xmlns="http://musicbrainz.org/ns/mmd-2.0#" xmlns:ext="http://musicbrainz.org/ns/ext#-2.0" created="2017-11-16T12:09:30.334Z"> which is what I used to try to synthesize the prefix.Jasper33

1 Answers

2
votes

The colon character in an XML attribute (or element) name such as ext:score separates the namespace prefix, ext, from the local name, score. Namespace prefixes themselves are significant only by virtue of their association with a namespace value.

For this XML,

<metadata xmlns:ext="http://musicbrainz.org/ns/mmd-2.0#">
  <node id="b071f9fa-14b0-4217-8e97-eb41da73f598" type="Group" ext:score="90">
  <node id="b071f9fa-14b0-4217-8e97-eb41da73f599" type="Person" ext:score="100">
  <node id="b071f9fa-14b0-4217-8e97-eb41da73f600" type="Business" ext:score="80">
</metadata>

This XPath,

//node[@ext:score="100"]/@id

will select the id attributes of all node elements with an ext:score attribute value of 100, provided you have a way to bind a namespace prefix (ext) to a namespace value (http://musicbrainz.org/ns/mmd-2.0# in the language or tool from which XPath is being called.

To bind a namespace prefix to a namespace value in Python (see How does XPath deal with XML namespaces? for Python and other language examples):

from lxml import etree
f = StringIO('your XML here')
doc = etree.parse(f)
r = doc.xpath('//node[@ext:score="100"]/@id', 
              namespaces={'ext':'http://musicbrainz.org/ns/ext#-2.0'})

Note that if your XML uses ext without declaring it, it is not namespace-well-formed.