0
votes

I am a total rookie and am trying to query data from a website and import it to Google docs spreadsheet. I have used firebug/firepath to find the xpath, when i paste the xpath into a cell =importxml(Url, query) it errors. Here it is the url: http://www.sportfishingreport.com/pages/boatdetail.php?boat_id=781

Boat        Trip Type     Anglers                    Catch
03-22-2015  Full Day    21    48 Ocean Whitefish, 210 Rockfish, 21 Lingcod

Can someone help me write the xpath because the xpath that firepath tells me to write errors in google docs.

Thanks in advance, Jess

1
Show the code of what you tried so far. - user4490638
Tell us the exact error message, show the formula you entered in that cell, the exact XPath expression - and show very clearly what you want the result to be. - Mathias Müller
url is: channelislandssportfishing.com/fish-counts. Each cell has a different xpath to query it. So lets say I want to import the boat name first which is the Erna B. Firepath says the xpath is: html/body/table[2]/tbody/tr/td/table/tbody/tr[1]/td/table/tbody/tr/td/table/tbody/tr/td/form/table/tbody/tr/td/div/table[2]/tbody/tr[1]/td/div/table[1]/tbody/tr[2]/td[1]. - jess j
I type =importxml(url, "//*html/body/table[2]/tbody/tr/td/table/tbody/tr[1]/td/table/tbody/tr/td/table/tbody/tr/td/form/table/tbody/tr/td/div/table[2]/tbody/tr[1]/td/div/table[1]/tbody/tr[2]/td[1]") and it comes back error imported xml content cannot be parsed. I would like to query and import the boat name, trip type, number of anglers, and catch, each with a different xpath and am unable to do this. Xpath validates it to a matching node. - jess j

1 Answers

1
votes

There are no tables in the source HTML of the second page you have indicated (that is, http://www.channelislandssportfishing.com/fish-counts). If anything, those tables are generated by Javascript, but then this content cannot be found by IMPORTXML, because it operates on the raw source HTML.

But what you get from Firepath is endlessly complicated anyway, because the tool tends to return path expressions that rely on positions of nodes, rather than actual values, or IDs, or names. If you look at the source HTML, the portion of HTML that contains "Erna B" looks like

<a href="/erna-b-sportfishing" ><span style="color: #000000;">Erna B</a>

And there is in fact a trivial XPath expression that selects this content, because the href attribute value is unique. To have "Erna B" appear in a cell in Google Sheets, use

=IMPORTXML("http://www.channelislandssportfishing.com/fish-counts","//a[@href='/erna-b-sportfishing']")

For all other cells, look for similar properties that uniquely identify nodes, and turn those into path expressions.