2
votes

I'm having an issue trying to get Rankings from the Freeride World Tour website.

I tried first to get a CSS code for rvest using selectorGadget in Chrome but but can only get the riders and their overall score. What I'm interested in is getting the points a rider scored in each heat. I'm new to web-scraping and CSS/HTML so please hang in there with me.

# Get the website url
url <- read_html("https://www.freerideworldtour.com/rankings-detailed?season=165&competition=2&discipline=38")

Download everything from the page,

(all_text <- url %>% 
                html_nodes("div") %>% 
                html_text())

then look for Kristofer Turdell's first score of 2500 pts. grep("2500 pts.", all_text) but I find...nothing?

When I right-click the 2500 pts. on the website and select "Inspect" I can see that the html code for this section is:

<div class="field__item even">2500 pts.</div>

So I tried to use the div class:

url %>%
     html_nodes(".field__item.even:) %>%
     html_text()

This only returns the overall score for the participants (e.g. Kristofer Turdell 7870 pts.).

Next, I tried using the right-click option to save Xpath from "Inspect".

url %>%
      html_nodes(xpath = "//*[@id="page-content"]/div/div/div[2]/div/div/div/div[1]/div[2]/div/div/div[1]/div/div[4]/div/div/div") %>%
      html_text()

I'm not having any luck on this so I'd really appreciate your help.

2
It looks like the data is being loaded with JavaScript when you click to expand each. It may still be available even though it's hidden, but otherwise you'd need to use something that can programmatically click the button for you like splashr or RSelenium.alistaire
It might be easier to take the urls of individual players and scrape from there. For example, url <- read_html("https://www.freerideworldtour.com/rider/kristofer-turdell") and then url %>% html_node("div") %>% html_text() %>% gsub("\\s*\\n+\\s*",";",.) %>% gsub("pts.", "\n", .) %>% read.table(text = ., fill = T, sep = ";", row.names = NULL, col.names = c("Drop", "Ranking", "FWT", "Events", "Points")) %>% subset(select=2:5) %>% dplyr::filter(!is.na(as.numeric(as.character(Ranking)))), following @Onyambu's style.Kim
Thank's @Kim. If I create a vector for rider riders <- c("kristofer-turdell", "markus-eder", "mickael-bimboes") and a prefix for url url_base <- "https://www.freerideworldtour.com/rider/" I should be able to make a for loop with bind_rows() however, I can only print to console for (i in riders) { read_html(paste0(url_base, i)) %>% html_node("div") %>% ... mutate(name = i) %>% print() }. Were ... stands for the parts of your code i omitted.Matthew J. Oldach

2 Answers

1
votes
url %>%
   html_node("div.panel-second")%>%
   html_text() %>%
   gsub("\\s*\\n+\\s*",";",.)%>%
   gsub("pts.","\n",.)%>%
   read.table(text=.,fill=T,sep=";",row.names = NULL)%>%
   subset(select=3:4)%>%na.omit()
                     V3   V4
1     Kristofer Turdell 7870
2           Markus Eder 7320
3       Mickael Bimboes 6930
4   Loic Collomb-Patton 6660
5           Yann Rausis 6290
6    Berkeley Patterson 5860
7           Leo Slemett 5835
8         Ivan Malakhov 5800
9          Craig Murray 5705
10         Logan Pehota 5655
11       Reine Barkered 5470
12        Grifen Moller 4765
13              Sam Lee 4580
14            Ryan Faye 3210
15         Conor Pelton 3185
16        George Rodney 3115
17     Taisuke Kusunoki 3060
18          Trace Cooke 2905
19        Aymar Navarro 2855
20        Felix Wiemers 2655
21         Fabio Studer 2305
22         Stefan Hausl 2240
23           Drew Tabke 1880
24 Carl Regnér Eriksson 1310
0
votes

Writing that much code in the comments was awful, so here goes. You can store the scraped data into a dataframe and not be limited to printing it to the console:

library(tidyverse)
library(magrittr)
library(rvest)

url_base <- "https://www.freerideworldtour.com/rider/"
riders <- c("kristofer-turdell", "markus-eder", "mickael-bimboes")

output <- data_frame()
for (i in riders) { 
  temp <- read_html(paste0(url_base, i)) %>% 
    html_node("div") %>% 
    html_text() %>% 
    gsub("\\s*\\n+\\s*", ";", .) %>% 
    gsub("pts.", "\n", .) %>% 
    read.table(text = ., fill = T, sep = ";", row.names = NULL, 
               col.names = c("Drop", "Ranking", "FWT", "Events", "Points")) %>%
    subset(select = 2:5) %>% 
    dplyr::filter(
      !is.na(as.numeric(as.character(Ranking))) & 
        as.character(Points) != ""
    ) %>%
    dplyr::mutate(name = i)
  output <- bind_rows(output, temp)
}

I put in parts such as as.character(Points) != "" to exclude the sum of points (such as in Mickael Bimboe's 6930 pts) and not individual scores.

Again, much credit goes to @Onyambu though, many lines are borrowed from his answer.