0
votes

I am scraping Amazon customer reviews using R and have come across a bug that I was hoping someone might have some insight into.

I have noticed that R fails to scrape the specified node (found by using SelectorGadget) from all reviews. Each time I run the script I retrieve a different amount, but never the entirety. This is very frustrating since the goal is to scrape the reviews and compile them into csv files that can later be manipulated using R. Essentially, if a product has 200 reviews, when I run the script, sometimes I will get 150 reviews, sometimes 75 reviews, etc- but not the entire 200. This issue seems to happen after I have done repeated scraping.

I have also gotten a few timeout errors, specifically "Error in open.connection(x, "rb") : Timeout was reached".

How do I get around this to continue scraping? I am a beginner but any help or insight is greatly appreciated!!

 url <- "https://www.amazon.com/Match-Mens-Wild-Cargo-Pants/product-reviews/B009HLOZ9U/ref=cm_cr_arp_d_show_all?ie=UTF8&reviewerType=all_reviews&pageNumber="

N_pages <- 204
A <- NULL
for (j in 1: N_pages){
   pant <- read_html(paste0(url, j)) 
   B <- cbind(pant %>% html_nodes(".review-text") %>%     html_text()     )
   A <- rbind(A,B)
 }
tail(A)


print(j) 
1

1 Answers

1
votes

Is this not working for you?

Setting the URL as "https://www.amazon.com/Match-Mens-Wild-Cargo-Pants/product-reviews/B009HLOZ9U/ref=cm_cr_arp_d_paging_btm_2?ie=UTF8&reviewerType=avp_only_reviews&sortBy=recent&pageNumber="

N_pages <- 204
A <- NULL
for (j in 1: N_pages){
  pant <- read_html(paste0(url, j)) 
  B <- cbind(pant %>% html_nodes(".review-text") %>%     html_text()     )
  A <- rbind(A,B)
}
tail(A)
        [,1]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
[1938,] "This is really a good item to get. Trendy, probably you can choose a different color, it fits good but I wouldn't say perfect."                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         
[1939,] "I don't write reviews for most products, but I felt the need to do so for these pants for a couple reasons.  First, they are great pants!  Solid material, well-made, and they fit great.  Second, I want to echo those who say you need to go up in size when you order.  I wear anywhere from 32-34, depending on the brand.  I ordered these in a 36 and they fit like a 33 or 34.  I really love the look and feel of these, and will be ordering more!"                                                                                                                                                            
[1940,] "I bought the green one before, it is good quality and looks nice, than I purchased the similar one, but the  khaki color, but received absolutely different product, different material. really disappointed."                                                                                                                                                                                                                                                                                                                                                                                                          
[1941,] "These pants are great!  I have been looking to update my wardrobe with a more edgy style; these cargo pants deliver on that.  Paired with some casual sneakers or a decent nubuck leather boot completes the look from the waist down.  The lazy-casual look is great when traveling, as are the many pockets.  I wore these pants on a recent day trip to NYC and traveled comfortably with essential items contained in the 8 pockets.  I placed a second order shortly after my first pair arrived because I like them so much.  Shipping and delivery is also fairly fast, considering these pants ship from China!"
[1942,] "Pants are awesome, just like the picture. The size runs small, so if you order them I would order them bigger than normal. I usually wear a 34inch waist because i dont like my pants snug, these pants fit more like a 32 inch waist.Other than that i love them!"                                                                                                                                                                                                                                                                                                                                                     
[1943,] "the good:Pants are made from the durable cotton that has a nice feel; have a lot of useful features and roomy well placed pockets; durable stitching.the bad:Pants will shrink and drier/hot water is not recommended. Would have been better if the cotton was pretreated to prevent shrinking. I would gladly gave up the belt if I wouldn't have to wary about how to wash these pants.the ugly:faux pocket with a zipper. useless feature. on my pair came with a bright gold zipper, unlike a silver in a picture."