2
votes

in this address i am trying to scrape a tage (that is Larg price which is bold red one)

i use LIBXML 2.2

when i try to extract the tag through this XPATH

//*[@class='priceLarge']

it works!

but to make queries easier i would like to use FireBug on Firefox.

Using FireBug it gives me this XPath

/html/body/div[2]/form/table[3]/tbody/tr/td/div/table/tbody/tr[2]/td[2]/span/b

using this Xpath it does not work, seems this one does not give a complete query. how can i modify this XPath to scrape the item ?

1

1 Answers

2
votes

Firefox and other browsers generate tbody tags in HTML.

In fact, the tbody is probably not there, so you can remove it in your XPath. (/html/body/div[2]/form/table[3]/tr/td/div/table/tr[2]/td[2]/span/b) You can test this by just saving the HTML from your application and viewing it in a text editor.

Since it seems the intent is to pull information from a web page however, your application will probably be more resistant to changes in the web page if you use XPath less dependent on the tree structure (i.e. //b[@class='priceLarge']).

EDIT: It seems that in addition to the tbody problem, Firefox is rendering the div (ID: divsinglecolumnminwidth) element as containing the form element (ID: handleBuy).

Looking at the html with an XML editor shows that the form element is a sibling of that div element, so the expression should start with /html/body/form/table[3].

One tool, among many others, to test your XPath expressions is HAP Testbed.