I'm trying to 'scrape' some data from a website (names). I know how to get the first name on the list -- but there are a few thousand names that I need to save in the same way.
Here's my code:
library(rvest)
library(tidyverse)
url <- ("https://www.advokatsamfundet.se/Advokatsamfundet-engelska/Find-a-lawyer/Search-result/?firstname=&lastname=&companyname=&postalcity=&country=4401&spokenlanguage=&sortingcity=&positions=102001")
names <- url %>%
read_html() %>%
html_elements(xpath = '/html/body/div[3]/div/div/main/div[2]/div[2]/div[1]/a') %>%
html_text()
This gives me the first name on the list as it is in the table.
The names follows this simple structure:
'/html/body/div[3]/div/div/main/div[2]/div[2]/div[1]/a')
'/html/body/div[3]/div/div/main/div[2]/div[3]/div[1]/a')
'/html/body/div[3]/div/div/main/div[2]/div[4]/div[1]/a')
Notice that we increase by 1 for each name. It ends on 6212.
I started working on a function, but I'm not getting anywhere. Here it is anyway -- but it doen't work and I think it may be a dead end.
scrape_fun <- function(.x){
names %>%
html_elements(xpath = '/html/body/div[3]/div/div/main/div[2]/div[.x]/div[1]/a') %>%
html_text()
}
Any advice on how get it to work? All 6212 names?