0
votes

I have successfully performed a little scraping from a site through Selenium. The data downloaded without problems. Good! On the site there are many products that, in Html, have the same identical name.

At the moment, only a single product and its details (name, description, price, seller, etc.) have been scraped, but I would like to scrape ALL the products on the page .... which I repeat, they have the same same identical name. Here are the names:

#Selenium code for scraping        
Product_Name = (driver.find_element_by_class_name ("tablet-desktop-only"). Text)
Product_Description = (driver.find_element_by_class_name ("h-text-left"). Text)
Vendor = (driver.find_element_by_class_name ("in-vendor"). Text)
Price = (driver.find_element_by_class_name ("h-text-center"). Text)

print(Product_Name)
print(Product_Description)
print(Vendor)
print(Price)

How to scrape other products too if they have the same exact same name? I would like to create a list of all products, not just one product. Thank you

3
Could you edit the question to include the URL of the website you are trying to scrap?Aryan Garg

3 Answers

2
votes

You are going to need to find all elements for each of the kinds of things you are looking for. So, start with:

Product_Names = driver.find_elements_by_class_name("tablet-desktop-only")
Product_Descriptions = driver.find_elements_by_class_name("h-text-left")
Vendors = driver.find_elements_by_class_name("in-vendor")
Prices = driver.find_elements_by_class_name("h-text-center")

You should have 4 lists of elements (not strings), each of which should be the same length, and picking up things in the same order. To be safe we will choose to work with the shortest list.

Num_Groups = min(len(Product_Names),len(Product_Descriptions),len(Vendors), len(Prices))

Then we loop over all 4 lists at the same time:

for i in range(Num_Groups):
    print(Product_Names[i].text)
    print(Product_Descriptions[i].text)
    print(Vendors[i].text)
    print(Prices[i].text)
    #you might want to add printing a blank line here

Note we need .text here so we get the text of the element, not a description of the element itself. Also note the [i] to get that element in the list.

Within this loop is where you would do your database inserts (though probably connect outside the loop), making sure to merge the .text into the SQL string, not the element's string representation.

1
votes

To find multiple elements with a specific class, we can use find_elements_by_class_name (The difference with the function you wrote is that in this function you should write element, instead of elements!). This function returns a list from which we can select the desired element from its indexes. Note that this gives you a list and you can not use text on it, but you must use it on its indexes. Example :

elements = find_elements_by_class_name('tablet-desktop-only')
print( elements[0].text )
# Or using a for :
for element in elements:
 print(element.text)
1
votes

Just for an example if tablet-desktop-only represent multiple value for Product name. You should use find_elements not find_element

name = driver.find_elements_by_class_name ("tablet-desktop-only")
for nme in name:
    print(nme.text)

You can easily replicate this for others like Description , Vendor and Price

Update 1 :

above name is a list in Python, similarly you can have list for Description , Vendor and Price

Now we have 4 list, we can print items one by one like this :

for seq in name + Description + Vendor + Price:
    print(seq)