Creating a data frame from poorly stored list data (Removing the first row which is junk)

Question

Our professor keeps giving us assignments to work with in R but instead of giving us easier data we normally have to pull from the web.

This block of code does that:

library(rvest)
url <- "https://www.supremecourt.gov/opinions/slipopinion/18"
page <- read_html(url)
table <- html_table(page, fill = FALSE, trim = TRUE)

However this also gets included in the table data:

table [[1]] X1 1 SEARCH TIPS\r\n Search term too short \r\n Invalid text in search term. Try again X2 1 ADVANCED SEARCHDOCKET SEARCH

So I am having a hard time understanding how to format this data into a data frame because doing something like as.data.frame(table) gives me this error,

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 11, 8, 7, 2

Your professor is doing you a favour, real world data is messy :) Assuming that you want the table for each month, it may be better to get the tables using html_nodes("table") with a selector for the desired tables, before using html_table. — neilfws

neilfws neilfws · Accepted Answer · 2019-03-31T22:52:23

You can use a selector to distinguish the tables with the data from other tables on the page, such as the search box. In this case, the data tables are of class table-bordered:

page %>% 
  html_nodes("table.table-bordered") %>% 
  html_table()

Creating a data frame from poorly stored list data (Removing the first row which is junk)

2 Answers