2
votes

I need to get data from an XML and I'm using XPath, quite new to it, though I'm liking it.

I'm retrieving some nodes based on their attributes like this:

/cesAlign/linkGrp[@targType='s']

Now I'd like to get the value of another attribute in the node:

/cesAlign/linkGrp[@targType='s']/@fromDoc

However, this returns the first hit only. I'd like to return the attribute of all nodes containing targType ='s'

I was thinking of looping over the nodelist and then reading the attribute... something like this:

expr = xpath.compile("/cesAlign/linkGrp[@targType='s']/@fromDoc");
    NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);

    int i = 0;
    for (i = 0; i < nl.getLength(); i++) {
        expr = xpath.compile("/@fromDoc");
        System.out.println((String) expr.evaluate(nl, XPathConstants.STRING));
    }

But I'm not sure if there's a better and more elegant way to do this.

Here's a sample XML:

<cesAlign version="1.0">
 <linkGrp targType="s" toDoc="mt/C2004310.01029701.xml.gz" fromDoc="en/C2004310.01029701.xml.gz">
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029702.xml.gz">
</cesAlign>

Thanks!

2

2 Answers

1
votes

I think you will have to iterate over found matches and fetch attribute value for each elements. Use "//cesAlign/linkGrp[@targType='s' and @fromDoc]" to select elements. Here is an elegant python solution:

#sample XML
xml = """
<cesAlign version="1.0">
 <linkGrp targType="s" toDoc="mt/C2004310.01029701.xml.gz" fromDoc="en/C2004310.01029701.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029702.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029703.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" fromDoc="en/C2004310.01029704.xml.gz"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" notFromDoc = "1"/>
 <linkGrp targType="s" toDoc="mt/C2004310.01029702.xml.gz" notFromDoc = "2"/>
</cesAlign>
"""
from lxml import etree
root = etree.fromstring(xml)
expr = root.xpath("//cesAlign/linkGrp[@targType='s' and @fromDoc]")
print "Matches:", len(expr)
for e in expr:
    print e.attrib["fromDoc"]

The output will be:

Matches: 4
en/C2004310.01029701.xml.gz
en/C2004310.01029702.xml.gz
en/C2004310.01029703.xml.gz
en/C2004310.01029704.xml.gz
0
votes

Alternatively, you can get each wanted attribute with a separate XPath expression:

/cesAlign/linkGrp[@targType='s'][$x]/@fromDoc 

where $x must be substituted with an integer in the interval:

[1, count(/cesAlign/linkGrp[@targType='s'])]

In case you have an XPath 2.0 engine available, the values of all wanted attributes can be obtained with a single XPath 2.0 expression:

/cesAlign/linkGrp[@targType='s']/@fromDoc/string(.)

when this XPath 2.0 expression is evaluated, the result is a sequence containing the string values of every wanted fromDoc attribute.