0
votes

Today when experimenting with using importXML in Google Sheets, I ran into a problem. I was attempting to import the title header of a USTA Tournament page into the Google Sheet, however, this did not work as it just resulted in the HTML title of the webpage being displayed ('TournamentHome'). Below is the Google Sheet, and the website that is used:

Google Sheet and Function:

=importXML(F2, "//html//body[@id='thebody']//div[@id='content']//div[@id='pagetitle']")

enter image description here


Website and Section of Source Code Being Used

enter image description here

The title that I am trying to extract from the website is TOWPATH 24th ANNUAL THANKSGIVING JR SINGLES.

The link to the website is https://m.tennislink.usta.com/tournamenthome?T=225779

2

2 Answers

1
votes

How about this sample formula? In this formula, the title value is directly retrieved from the script before the value is put to #pagetitle. Please think of this as just one of several answers.

Sample formula:

=REGEXEXTRACT(IMPORTXML(A1,"//div[@class='tournament_search']/script"),"escape\(""([\w\s\S]+)""")

Result:

When https://m.tennislink.usta.com/TournamentHome/tournament.aspx?T=38079 and https://m.tennislink.usta.com/tournamenthome?T=225779 are put in "A1" and "A2", the results are as follows.

enter image description here

Reference:

1
votes

update:

=REGEXEXTRACT(QUERY(ARRAY_CONSTRAIN(IMPORTDATA(
 "https://m.tennislink.usta.com/tournamenthome?T=225779"), 555, 1), 
 "where Col1 contains 'escape'"), "\(""(.*)""\)")

enter image description here


unfortunately, that won't be possible the way you trying because the field you attempt to scrape is controlled by JavaScript and Google Sheets can't understand/import JS. you can test this simply by disabling JS for a given link and you will see what exactly can be imported into Google Sheets:

0