Xpath //a[contains(@class, 'storylink')]/@*
will extract all attributes of anchor tags. Anchor tag in my xml doesn't have title attribute which usually have content of the link. Is there a way to select both href and text content in anchor link with XPATH 1.0 ?
2
votes
Add a full example of the source content (a full XML fragment) and the exact expected output.
– David Ennis
@DavidEnnis eg: http://news.ycombinator.com I'm trying to extract title along with href.
– Shanthakumar
Sorry... I meant in your sample - not for someone to have to fish around on the source of a web page.. Expand your question with example input and expected output..
– David Ennis
2 Answers
2
votes
If you want to select both @href
and text()
in a single XPath selection, you can use the union operator |
.
With XPath 1.0, this is probably the best you can do:
//a[contains(@class, 'storylink')]/@href | //a[contains(@class, 'storylink')]/text()
With XPath 2.0 (or higher) you could avoid repeating the anchor selection criteria:
//a[contains(@class, 'storylink')]/(@href,text())
0
votes
Just select the a
element itself using //a[contains(@class, 'storylink')]
and then get the required attributes and/or text content using Javascript methods on the returned element node.
You could indeed select both the attributes and the text using XPath, but if they're all jumbled up in a single query result then it's a hassle separating them again.