0
votes

I am new to R and have been doing some web scraping. I have written the following code which will place the ID, name, colour and price of a specific item from https://uk.burberry.com/ into a data frame.

# load package
library(rvest)

# Example URL
url <- 'https://uk.burberry.com/fringed-wool-cashmere-patchwork-cardigan-coat-p40612561'

# Read HTML code from the website
webpage <- read_html(url)

# using css selectors to scrape the ID section
id_data_html <- html_nodes(webpage, '.section') 
#converting the ID to text
id_data <- html_text(id_data_html)
# Remove irrelevant text
id_data <- gsub("Item", "", id_data)

# using css selectors to scrape the names section
names_data_html <- html_nodes(webpage, '.type-h6') 
#converting the names to text
names_data <- html_text(names_data_html)
# Stripping irrelevant text
names_data <- gsub("\n\t\t\t\t\t\t\t", "", names_data)

# using css selectors to scrape the price section
price_data_html <- html_nodes(webpage, '.l2') 
#converting the price to text
price_data <- html_text(price_data_html)
# Remove irrelevant text
price_data <- gsub("\t", "", price_data)
price_data <- gsub("\n", "", price_data)

# using css selectors to scrape the colour section
colour_data_html <- html_nodes(webpage, '#colour-picker-value') 
#converting the colour to text
colour_data <- html_text(colour_data_html)

# creating the dataframe
burberry_df <- data.frame(ID = id_data, Name = names_data, Price = price_data, Colour = colour_data)

Is there a way to create a loop so I can use this code for every item on the website and put the results in a data frame? Thanks

1
"1.4 … Subject to the terms herein, we grant you a revocable, and non-exclusive licence to access and make personal use of the Platforms limited such that it does not include the right to: … (c) use software robots, spiders, crawlers, or similar data gathering and extraction tools…". Please at least indicate when you're asking others to violate ToS and incur possible legal or civil actions against them. - hrbrmstr

1 Answers

0
votes

You could create a function which accepts an input URL and returns a data frame with the information collected from the webpage:

get_page_data <- function(url) {
    # Read HTML code from the website
    webpage <- read_html(url)

    # using css selectors to scrape the ID section
    id_data_html <- html_nodes(webpage, '.section') 
    #converting the ID to text
    id_data <- html_text(id_data_html)
    # Remove irrelevant text
    id_data <- gsub("Item", "", id_data)

    # using css selectors to scrape the names section
    names_data_html <- html_nodes(webpage, '.type-h6') 
    #converting the names to text
    names_data <- html_text(names_data_html)
    # Stripping irrelevant text
    names_data <- gsub("\n\t\t\t\t\t\t\t", "", names_data)

    # using css selectors to scrape the price section
    price_data_html <- html_nodes(webpage, '.l2') 
    #converting the price to text
    price_data <- html_text(price_data_html)
    # Remove irrelevant text
    price_data <- gsub("\t", "", price_data)
    price_data <- gsub("\n", "", price_data)

    # using css selectors to scrape the colour section
    colour_data_html <- html_nodes(webpage, '#colour-picker-value') 
    #converting the colour to text
    colour_data <- html_text(colour_data_html)

    # creating the dataframe
    burberry_df <- data.frame(ID = id_data, Name = names_data, Price = price_data,
                              Colour = colour_data)

    return(burberry_df)
}

Then to use the function simply call it while passing the URL of interest:

url <- 'https://uk.burberry.com/fringed-wool-cashmere-patchwork-cardigan-coat-p40612561'
result <- get_page_data(url)