I am working on web scraping using Rvest in R. I tried to fetch data from search pages which has 12 pages. And I wrote a code to iterate page to collect data from each page. But my code collects only 1st page repeatedly. Here is the sample for my code.
# New method for Pagination
url_base <- "https://www.nhs.uk/service-search/Hospital/LocationSearch/7/ConsultantResults?SortBy=1&Distance=400&ResultsPerPage=10&Name=e.g.%20Singh%20or%20John%20Smith&Specialty=230&Location.Id=0&Location.Name=e.g.%20postcode%20or%20town&Location.Longitude=0&Location.Latitude=0&CurrentPage=1&OnlyViewConsultantsWithOutcomeData=False"
map_df(1:12, function(i) {
cat(".")
pg <- read_html(sprintf(url_base,i))
data.frame(consultant_name = html_text(html_nodes(pg,".consultants-list h2 a")))
}) -> names
dplyr::glimpse(names)
Edited Version of code:
# New method for Pagination
url_base <- "https://www.nhs.uk/service-search/Hospital/LocationSearch/7/ConsultantResults?ResultsPerPage=100&defaultConsultantName=e.g.+Singh+or+John+Smith&DefaultLocationText=e.g.+postcode+or+town&DefaultSearchDistance=25&Name=e.g.+Singh+or+John+Smith&Specialty=230&Location.Name=e.g.+postcode+or+town&Location.Id=0&CurrentPage=%d"
map_df(1:12, function(i) {
cat(".")
pg <- read_html(sprintf(url_base,i))
data.frame(consultant_name = html_text(html_nodes(pg,".consultants-list h2 a")),
gmc_no = gsub("GMC membership number: ","",html_text(html_nodes(pg,".consultants-list .name-number p"))),
Speciality = html_text(html_nodes(pg,".consultants-list .specialties ul li")),
location = html_text(html_nodes(pg,".consultants-list .consultant-services ul li")),stringsAsFactors=FALSE)
}) -> names
dplyr::glimpse(names)
The above code accepts 8 loops fetching rows of 800 i.e 100 per page but then it throes an error.
.........Error in data.frame(consultant_name = html_text(html_nodes(pg, ".consultants-list h2 a")), : arguments imply differing number of rows: 100, 101 Called from: data.frame(consultant_name = html_text(html_nodes(pg, ".consultants-list h2 a")), gmc_no = gsub("GMC membership number: ", "", html_text(html_nodes(pg, ".consultants-list .name-number p"))), Speciality = html_text(html_nodes(pg, ".consultants-list .specialties ul li")), location = html_text(html_nodes(pg, ".consultants-list .consultant-services ul li")), stringsAsFactors = FALSE) Browse[1]>
I tried to change the loop numbers but no luck.
Please help me to solve this!!!