0
votes

I have scraped eBay's product information such as name, price, description, etc from a given URL. However, If there are multiple URLs say a set of 10 URLs, how do I ensure that all of the websites are scraped? Here is my code for web scraping eBay's products. webscraper.py

import time
from selenium import webdriver
from bs4 import BeautifulSoup
from webdriver_manager.chrome import ChromeDriverManager


def scrape_products():
    website_address = ['https://www.ebay.co.uk/itm/The-Discworld-series-Carpe-jugulum-by-Terry-Pratchett-Paperback-Amazing-Value/293566021594?hash=item4459e5ffda:g:yssAAOSw3NBfQ7I0',
                       'https://www.ebay.co.uk/itm/Edexcel-AS-A-level-history-Germany-and-West-Germany-1918-89-by-Barbara/293497601580?hash=item4455d1fe2c:g:6lYAAOSwbRFeXGqL']
    options = webdriver.ChromeOptions()
    options.add_argument('start-maximized')
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option("useAutomationExtension", False)

    browser = webdriver.Chrome(ChromeDriverManager().install(), options=options)
    for web in website_address:
        browser.get(web)
        time.sleep(2)
        product_price_raw_list = browser.find_elements_by_xpath('//*[@id="vi-mskumap-none"]')
        product_name_raw_lst = browser.find_elements_by_xpath('//*[@id="itemTitle"]')
        #rest of the code


if __name__ == "__main__":
        scrape_products()

The code here scrapes the first website address but not the second? Why is it not scraping the second website_address? I have also tried to append it on an empty list but still it did not work. I am not able to figure it out. Please help! Thanks.

1
What if you move the creation of the browser inside the loop?Tim Roberts
@TimRoberts Still it displays the first website,not the second.technophile_3
This code works fine if I try to run it. Maybe there is something in the rest of your code that is causing the issue.Rhys Flook
@RhysFlook Yes the code works fine, although It is not scraping the second address(or multiple addresses if I enter).technophile_3
What do you mean it is not scraping? If I try to get the text from the elements I am able to successfully get the correct text from each webpage.Rhys Flook

1 Answers

1
votes
import time
from selenium import webdriver
from bs4 import BeautifulSoup
from webdriver_manager.chrome import ChromeDriverManager


def scrape_products():
    website_address = [
        'https://www.ebay.co.uk/itm/The-Discworld-series-Carpe-jugulum-by-Terry-Pratchett-Paperback-Amazing-Value/293566021594?hash=item4459e5ffda:g:yssAAOSw3NBfQ7I0',
        'https://www.ebay.co.uk/itm/Edexcel-AS-A-level-history-Germany-and-West-Germany-1918-89-by-Barbara/293497601580?hash=item4455d1fe2c:g:6lYAAOSwbRFeXGqL']
    options = webdriver.ChromeOptions()
    options.add_argument('start-maximized')
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option("useAutomationExtension", False)

    browser = webdriver.Chrome(executable_path='chromedriver.exe', options=options)
    for web in website_address:
        browser.get(web)
        time.sleep(2)
        product_price_raw_list = browser.find_element_by_xpath('//*[@id="vi-mskumap-none"]').text
        product_name_raw_lst = browser.find_element_by_xpath('//*[@id="itemTitle"]').text
        print(product_name_raw_lst)
        print(product_price_raw_list)


if __name__ == "__main__":
    scrape_products()

for me work changing the elements to element in variables product_name_raw_lst / product_price_raw_list

scraping two items