0
votes

Im having trouble Scraping the tables from this website. All i get is 1 line of code when im after a table of data. The website is here. https://mc.championdata.com/anz_premiership/index.html?competitionid=11035&matchid=110350101 and my code below.

library(xml2)
library(rvest)
library(XML)

datalist = list()

web<- render_html(url = 'https://mc.championdata.com/anz_premiership/index.html?competitionid=10574&matchid=105740101')


#xpath =  '//*[@id="cd6364_SHELL_grids"]/div[1]/table'
#print(xpath)
  
#tables<- html_nodes(web, 'table')
track<- web %>%
  html_nodes(xpath = '//*[@id="cd6364_SHELL_grids"]/div[1]/table') %>%
  html_table()```
1
I think the data is generated dynamically. Try using RSelenium to scrape that.Ronak Shah
see a similar postWaldi

1 Answers

0
votes

As with most modern data-rich web pages, the data you are looking for is not in the html document sent by an http request to that url. Instead, your browser receives the html, which contains javascript code. Your browser can run this javascript code, which prompts it to send further http requests for the actual serialized data (usually in json format) that populates the page. When you are web scraping with rvest or using other static web-scraping tools, the original html is received as plain text, and there is no javascript engine that will automatically work on it to generate the requests for json.

Therefore, the reason you cannot get the data from this page is that the data is not on the page you have downloaded.

To get round this, you have to use the console in your web browser (via F12) and find out the url at which the json is located by watching for XHR requests being made by your browser (or finding direct links embedded in the html text itself). In your case, the json address is https://mc.championdata.com/data/11035/fixture.json?_=1593081934709.

You can parse the json directly and shape it into a data frame like this:

url <- "https://mc.championdata.com/data/11035/fixture.json?_=1593081934709"
fixture <- jsonlite::read_json(url)$fixture$match

df <- do.call(rbind, lapply(fixture, function(x) 
  as.data.frame(x[names(x) %in% names(fixture[[20]])])))

dplyr::as_tibble(df)
#> # A tibble: 45 x 22
#>    awaySquadName matchType homeSquadId homeSquadShortC~ homeSquadNickna~
#>    <fct>         <fct>           <int> <fct>            <fct>           
#>  1 Central Pulse H                 802 TAC              Tactix          
#>  2 Northern Mys~ H                8120 NS               Stars           
#>  3 WBOP Magic    H                 808 STE              Steel           
#>  4 Northern Mys~ H                 809 WBM              Magic           
#>  5 Mainland Tac~ H                 808 STE              Steel           
#>  6 Central Pulse H                8120 NS               Stars           
#>  7 Mainland Tac~ H                8120 NS               Stars           
#>  8 WBOP Magic    H                 802 TAC              Tactix          
#>  9 Southern Ste~ H                 805 MYS              Mystics         
#> 10 Southern Ste~ H                8120 NS               Stars           
#> # ... with 35 more rows, and 17 more variables: matchStatus <fct>,
#> #   roundNumber <int>, homeSquadName <fct>, awaySquadNickname <fct>,
#> #   venueId <int>, awaySquadId <int>, venueCode <fct>, localStartTime <fct>,
#> #   matchId <int>, finalCode <fct>, finalShortCode <fct>, venueName <fct>,
#> #   utcStartTime <fct>, awaySquadCode <fct>, homeSquadCode <fct>,
#> #   awaySquadShortCode <fct>, matchNumber <int>

Created on 2020-06-25 by the reprex package (v0.3.0)