2
votes

I want to extract the Holdings table from Here I have the following code:

 library(rvest)

 turl = 'https://whalewisdom.com/stock/spy'
 test_html = read_html(turl) 
 df<-html_table(test_html)

However on running it i get the following error:

Error in matrix(NA_character_, nrow = n, ncol = maxp) : invalid 'ncol' value (too large or NA) In addition: Warning messages: 1: In max(p) : no non-missing arguments to max; returning -Inf 2: In matrix(NA_character_, nrow = n, ncol = maxp) : NAs introduced by coercion to integer range

2
I am not sure if this is the entire problem, but that table seems to be filled asynchronously by an AJAX call after the page loads. I do not see the data for the table in "View Source" in the browser, nor in the data returned by read_html. I am not sure there is an R solution for scraping asynchronously loaded web pages, but perhaps someone else knows of one. A non-R solution may be headless browsers - Eric
Thanks. This was my concern too. I could not find a link to the table page embedded in the source code so was wondering where the data is being called from. - Talha Naushad

2 Answers

0
votes

Hi I came across the same situation, through searching around I came across a solution to this at R Studio Community

I hope this will be helpful to you too.

0
votes

Index for the table of interest or simply grab the appropriate table node and use fill=True then do a little tidying on contents

library(rvest)
library(magrittr)

t <- (read_html('https://whalewisdom.com/stock/spy') %>%
          html_node('form + .table') %>%
          html_table(fill=T))