1
votes

I'm trying to get a specific number in webpages of: https://ideas.repec.org/. More specifically, I'm looking for the number of search results like this:IDEAS' search results

However, when I'm applying the following code, I get an empty string:

library(rvest)

x <- GET("https://ideas.repec.org/cgi-bin/htsearch?form=extended&wm=wrd&dt=range&ul=&q=labor&cmd=Search%21&wf=4BFF&s=R&db=01%2F01%2F1950&de=31%2F12%2F1950")
webpage <- read_html(x)
hits_html <- html_nodes(webpage, xpath = '//*[@id="content-block"]/p')
hits <- html_text(hits_html)
hits

[1] ""
1

1 Answers

1
votes

You could regex it out from the appropriate node. This does assume a constant before and after string and case. You could make also case insensitive with (?i)found\\s+(\\d+)\\s+results.

library(rvest)
library(stringr)
page  = read_html("https://ideas.repec.org/cgi-bin/htsearch?form=extended&wm=wrd&dt=range&ul=&q=labor&cmd=Search%21&wf=4BFF&s=R&db=01%2F01%2F1950&de=31%2F12%2F1950")
r  =  page %>% html_node("#content-block") %>% html_text() %>%toString()
x <- str_match_all(r,'Found\\s+(\\d+)\\s+results')
print(x[[1]][,2])