0
votes

New to python and BeautifulSoup.

I'm having trouble applying my function to all the list items (li class) on this eBay sold items page.

Each sold listing is displayed as an li class. Below is the URL.

https://www.ebay.com/sch/i.html?_from=R40&_nkw=2017+patrick+mahomes+psa+10+auto&_sacat=0&LH_TitleDesc=0&LH_Complete=1&LH_Sold=1&_ipg=100

Here is my code...

def rookie_card_info(url):
    r = requests.get(url)
    
    soup = bs(r.content)
    
    contents = soup.prettify()
    
    rookie_card_list = soup.find_all(class_="srp-results srp-list clearfix")
    
    #How to to apply to all list items
    
    display_image = soup.find(class_="s-item__image-img")
    img_src = display_image.get('src')
    test = Image(img_src)
    display(test)
    print(img_src)

    display_card = soup.find(class_="s-item__link")
    card_title = display_card.find("h3")
    get_card_title = card_title.text
    print(get_card_title)

    display_sold_price = soup.find(class_="s-item__detail s-item__detail--primary")
    card_sold_price = display_sold_price.find("span")
    sold_price_text = card_sold_price.find(class_="POSITIVE")
    print(sold_price_text.string)

    display_sold_date = soup.find(class_="s-item__ended-date s-item__endedDate")
    card_sold_date = display_sold_date.string
    print(card_sold_date)
    
rookie_card_info("https://www.ebay.com/sch/i.html?_from=R40&_nkw=2017+patrick+mahomes+psa+10+auto&_sacat=0&LH_TitleDesc=0&LH_Complete=1&LH_Sold=1&_ipg=100")

I wrote the below line of code in an attempt to identify the specific class on the page that encompasses all li class objects on the sold results page (there are about 65 results).

rookie_card_list = soup.find_all(class_="srp-results srp-list clearfix")

When I print the html from this line of code. It contains all of the correct html I would need to parse for my data.

The data I am parsing for is the image URL, title, sold date, and sold price.

I get the correct data as shown below...

https://i.ebayimg.com/thumbs/images/g/2BwAAOSwETVfzAFT/s-l225.jpg
2017 Armed & Dangerous Patrick Mahomes Auto MINT Condition PSA 10? Rookie Signed
$2,500.00
Dec-27 12:46

However, I only get the data from the very first "li class" listing on the page. and not the data for all 65 results.

Question:

What can I do to get data for all 65 results. I need a "plug in and play" solution that I can apply to different URLs.

1

1 Answers

1
votes

I am not going to write the full answer for you. Instead, give pointers:

  1. To get a list of cards:

    rookie_card_list = soup.select('.s-item')

then loop over each item within that:

for current_card in rookie_card_list:
  1. Re-write your function to work off the current_card whilst looping i.e. have soup generated by a separate function then have another function which specifically takes in your current card and returns a list of the items you want from that card (perhaps to then add to an overall global list you later turn to a dataframe?). Add in any error handling e.g. if not found what to return?

E.g.

def items_from_current_card(current_card):
    card_title = current_card.find("h3").text
    # etc
    return [card_title, ......]
  1. There are rarely plug and play answers with web-scraping I'm afraid. It is for you to work out a generic enough solution to work over as many pages as possible.