I would like to determine the number of pages from pagination on the page: https://aplikacje.nfz.gov.pl/umowy/Provider/Index?ROK=2017&OW=07&ServiceType=03&Code=&Name=&City=&Nip=&Regon=&Product=&OrthopedicSupply=false
============
Table
============
Pagination: Link1, Link2, Link3, Link4, LinkNext,Link Last
With selector gadget I identified the pagination is in ".pagination-container, a"
I would like to
- dump all the links in the pagination to a vector or data.frame
- get the last number in the urls strings
- determine max number indicating how many pages are there in the pagination to use it later on in a scraping loop
Following http://francojc.github.io/web-scraping-with-rvest/
I started with
library(tidyverse)
library(rvest)
url <- "https://aplikacje.nfz.gov.pl/umowy/Provider/Index?ROK=2017&OW=07&ServiceType=03&Code=&Name=&City=&Nip=&Regon=&Product=&OrthopedicSupply=false"
urls <- url %>% # feed `main.page` to the next step
html_nodes(".pagination-container, a") %>% # get the CSS nodes
html_text("href")
On html_nodes it throws an error
Error in UseMethod("xml_find_all") :
no applicable method for 'xml_find_all' applied to an object of class "character"
What am I doing wrong?
read_html()
? And, you likely wanthtml_attr("href")
vshtml_text("href")
. - hrbrmstr