0
votes

I am very new to web-scraping and was introduced to it just today after trying to figure out a formula on a spreadsheet.

I would like to retrieve the Sector information onto Yahoo Finance into Google Sheets. I would also like to the data to update when there is a change to cell B7. Link: https://finance.yahoo.com/quote/MIDD/profile?p=MIDD

I came up with the following, but get a #N/A error: =importxml("https://finance.yahoo.com/quote/",B7,"/profile?p=",B7, "//*[@class='Fw(600) [@data-reactid='21']")

Please let me know what I might be doing wrong. Thank you in advance.

1
What does it say when you mouse over?Calculuswhiz
It says: Error Wrong number of arguments to IMPORTXML. Expected 2 arguments, but got 5 arguments.Monica Wall
I saw another thread about using div class or '@id' instead of '@class', however I'm not familiar with how to apply it to this objective.Monica Wall
Well, first thing, you join strings together with &, not ,.Calculuswhiz
Thanks. The function info in Google Sheet used a ",". I updated to & and received the following: Error Wrong number of arguments to IMPORTXML. Expected 2 arguments, but got 1 arguments.Monica Wall

1 Answers

0
votes

Solution

This is the right syntax to use IMPORTXML formula:

=IMPORTXML("URL", "XPATH_QUERY")

In your case this will translate to:

=importxml("https://finance.yahoo.com/quote/"&B7&"/profile?p="&B7,"//*[@class='Fw(600)'] [@data-reactid='21']")

Which will return an empty result.

Considerations

Keep in mind that many sites go to great lengths to actively prevent scraping. Allowing you to scrape their data entirely, undermines their business model. Since they might make profit from adds for example.

Check in the page you want to scrape if the tags you are watching for correspond to the data you wanted to get in the first place. I believe in this case it's just a matter of changing the tags values to the proper ones.