I'd like to scrape Amazon customer reviews and while my code works fine if there's no "missing" information, converting the scraped data to a data frame doesn't work anymore if parts of the data are missing (arguments imply differing number of rows).
This is an example code:
library(rvest)
url <- read_html("https://www.amazon.de/product-reviews/3980710688/ref=cm_cr_dp_d_show_all_btm?ie=UTF8&reviewerType=all_reviews&pageNumber=42&sortBy=recent")
get_reviews <- function(url) {
title <- url %>%
html_nodes("#cm_cr-review_list .a-color-base") %>%
html_text()
author <- url %>%
html_nodes(".author") %>%
html_text()
df <- data.frame(title, author, stringsAsFactors = F)
return(df)
}
results <- get_reviews(url)
In this case, "missing" means that there's no author information provided for multiple customer reviews (Ein Kunde simply means A customer in German).
Does anyone have an idea on how to fix this? Any help is appreciated. Thanks in advance!