1
votes

I am trying to get a string before '--' within a paragraph in an html page using the xpath and send it to yql

for example i want to get the date from the following article:

<div>
<p>Date --- the body of the article</p>
</div>

I tried this query in yql:

select * from html where url="article url" and xpath="//div/p/text()/[substring-before(.,'--')]"

but it does not work.

how can I get the date of the article which is before the '--'

2
Good question, +1. See my answer for a complete, short and easy solution. - Dimitre Novatchev

2 Answers

0
votes

You can simply use:

  substring-before(//div/p,'--')
0
votes

Use:

substring-before(/div/p/text(), '--')

This XPath expression evaluates to the string immediately preceding '--' in the first text node in the XML document, that is a child of a p that is a child of the div top element.

In case you want to get this value for every such text node, you have to use an expression like:

substring-before((//div/p/text())[$k], '--')

and evaluate this expression $N times, for $k = 1,2, ..., $N

where $N is count(//div/p/text())

Do note: Try to avoid using the // XPath pseudo-operator always when the structure of the XML document is statically known. Using // usually results in big inefficiency (O(N^2)) that are felt especially painful on big XML documents.