1
votes

EDIT: It looks like my issue is that Google is querying from the US when I want Australian data. I'm now looking for a way to scrape sites using IMPORTXML in a way that the site recognises as coming from Australia.


I am using the Google Sheets IMPORTXML function to scrape a collectibles website for price data on their top Digimon cards. I can see clearly on the page that the value of the first card is $129.99 AUD, but when I query the page, I get different values. Here's the import function I'm using:

IMPORTXML("https://www.cherrycollectables.com.au/collections/digimon-singles","(//div[@class='productitem--info'])")

At the moment, this first card is showing a value of $99 on my spreadsheet, but on other days I've seen $100 and $124 with no change being shown on the web page - it has stayed at $129.99.

In Chrome's Network logger I can see that the URL is correct; a GET request is being made to this URL to generate the page. The response XML includes the values as they appear on the page, so I don't think the DOM is being edited (which would explain different values).

Additionally, changing to any other currency (I've had issues with Google being based in the US vs me in AU) doesn't line up with the values being returned either.

Image of HTML as it appears on the page

Image of HTML as it appears on the page

Image of correct dollar value in the GET response

Image of correct dollar value in the GET response

Image of incorrect dollar value appearing on my spreadsheet

Image of incorrect dollar value appearing on my spreadsheet

Thank you for your time!

2

2 Answers

0
votes

try:

=IMPORTXML("https://au.4everproxy.com/direct/aHR0cHM6Ly93d3cuY2hlcnJ5Y29sbGVjdGFibGVzLmNvbS5hdS9jb2xsZWN0aW9ucy9kaWdpbW9uLXNpbmdsZXM-",
 "(//div[@class='productitem--info'])")

1

enter image description here

0
votes

When I view the source code, I can see that the price in the HTML is hardcoded as $99.00 for the first card. I also see that the code includes scripts that convert that amount on screen depending on what is chosen in the currency dropdown for the page. Google Sheets can only read the HTML; it has no way to account for active scripts.

Of note, when I choose 'AUD' from the currency dropdown (I am located in the USA), it shows the hardcoded price of $99.00 followed by 'USD' and not 'AUD'. This also indicates that what you see on the site is based on scripts that may be detecting location and, in some respects, mixing things up.