0
votes

I am new to XPath and would like to know how to extract values in an XML document.

I have an XML:

<root>
<element1 attrib1 = value1 attrib2 = value2 >
<element2 attrib1 = value1 attrib2 = value2 >
<element3 attrib1 = value1 attrib2 = value2 >
</root>

What I want to do is extract all attrib=value pairs alongwith the element name. Eg: element1 attrib1 value1 element2 attrib2 value2 . . element3 attrib2 value2

I have tried using the '//@*' XPath query, which return attrib=value, not the elt name.

Any ideas?

Thanks!

2
You can use a tool such as the XPath visualizer (huttar.net/dimitre/XPV/TopXML-XPV.html) to learn XPath quickly, playing and experimenting with different XPath expressions on any wanted XML document. The selected nodes are highlighted inline in the XML document. Evaluation results that aren't nodes are also presented.Dimitre Novatchev
Krish: What you used lacks 2/3rds of the functionality of the XPath Visualizer and is rather ugly and inconvenient. It doesn't accept a large class of XPath expressions at all. Try count(//*). :)Dimitre Novatchev

2 Answers

1
votes

You can use '*/*' to find all elements at the 2nd level.

my $xp = XML::XPath->new( ioref => \*DATA );

# select the element nodes without having to specify their names
my @element_nodes = $xp->findnodes('*/*'); 

foreach my $element (@element_nodes) {
    # see https://metacpan.org/module/XML::XPath::Node::Element
    print $element->getName;
    foreach my $attribute ($element->getAttributes) {
        # see https://metacpan.org/module/XML::XPath::Node::Attribute
        print ' '.$attribute->getName.' '.$attribute->getData;
    }
    print "\n";
}

__DATA__
<root>
<element1 attrib1="value1" attrib2="value2" />
<element2 attrib1="value1" attrib2="value2" />
<element3 attrib1="value1" attrib2="value2" />
</root>
1
votes

To extract the values from the XML file you need to do the following,

use XML::XPath;

my $i;

#specify the file name

my $xpath = XML::XPath->new(filename => "file.xml");

# Now you can traverse through the nodes and get the atrributes

$i = $xp->find('/root/element1')->get_node(1);

# store the extracted values in an array 

push @attrib1, sprintf($i->getAttribute('attrib1'));

push @attrib2, sprintf($i->getAttribute('attrib2'));

$i = $xp->find('/root/element2')->get_node(1);

push @attrib1, sprintf($i->getAttribute('attrib1'));

push @attrib2, sprintf($i->getAttribute('attrib2'));

END

Refer this for more details about Xpath

http://search.cpan.org/~msergeant/XML-XPath-1.13/XPath.pm