2
votes

Update

I am now using this code

from bs4 import BeautifulSoup
import requests
res=requests.get("https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=Playstation+1&_sacat=0&_pgn=1")
soup=BeautifulSoup(res.text,'html.parser')
for item,price in zip(soup.select('.lvtitle>a'),soup.select('.lvprice.prc >span')):
    print(item.text + " : " + price.text.strip())

It outputs the prices and product titles in a really nice, easy to read format but it is outputting it in a different order to how they are displayed on eBay.

The first four outputs the script gives are
(1) SONY PLAYSTATION 1 PS1 CONSOLE / Tested Working & Controller / 3 FREE GAMES : £28.75 (2) Playstation 1 With Games Including Crash : £20.00
(3) Original Sony Playstation 1 Bundle : £29.99
(4) Sony Playstation 1 PS1 Console Bundle Joblot AV TV Lead : £26.99

But the first four items on eBay are
(1) SONY PLAYSTATION 1 PS1 CONSOLE / Tested Working & Controller / 3 FREE GAMES £28.75 (2) Sony Playstation 1 PS1 Console Bundle Joblot AV TV Lead £26.99 (3) Sony Playstation 1 PS1 PSONE Console Bundle & TV AV Lead TESTED WORKING £29.99 (4) NEW LISTING Sony Playstation PS1 Console Boxed, 2 Controllers, 2 Memory Cards, Original Demo £44.99

Original Question

I want a web scraper to find the product names and prices for all 50 products on the page - https://ebay.co.uk/sch/i.html?_from=R40&_nkw=Playstation+1&_sacat=0&_pgn=1

I ran this code -

for post in soup.select("h3"):
    print (post)

-and here was the output (there was more output which I have not included).

<h3 class="header">Please enable JavaScript </h3>
<h3>Format</h3>
<h3 class="lvtitle"><a class="vip" href="https://www.ebay.co.uk/itm/SONY-PLAYSTATION-1-PS1-CONSOLE-Tested-Working-Controller-3-FREE-GAMES/303195399469?_trkparms=ispr%3D1&amp;hash=item4697daa52d:g:K4YAAOSwJmVZ3Ly2&amp;enc=AQAEAAACMBPxNw%2BVj6nta7CKEs3N0qWwG%2FRu4GnzgljVwFYrAPzHjWoiQBIVRFaiPx%2BTZTxK4PBmFSLjHJych5RmooPO%2Fk9I2FqbhK%2BiSCw84S6G5mJqoWRKrmMjE24xQXLI5Tq6prSXt%2Fl5%2BXX5BIj4WcnTSRw8zPLA8umy3NNPbVTyoK8Ir4SgF685KWrEZByct3cX%2FNqc5BQAFj8A46XUhzSY5c6E7GenyGTc%2FEQDW5amzX8BGDa7T0srwIlbSRcuyfaQ%2B0SLD7yDUsYuTxD215mWHQ3jGZserqtWLuVuoXoidgYghdc%2F0t1zF8W%2BTfcz9BxPYvkonPcOijxgbVEK9QVdgsAWHkf0Xgbg%2Fy2bfe2AEykNv3gKXGeFt4HUHjWXFmokHvVMEi8x8W0NNos1x%2FEs%2FCWDq5oOKte%2F5eQ0UNX9mSQ%2BFdS5KVwemULfk807XdSPQ8Rt7fWuLyo1r7L8GGKuYDzb7F4UyzwI5Cl5x72C8%2FJuRTurvboTtjX8kZWYSf5WWRZlwXi1EL%2B6K2hE%2FzAKMcMZ8MGjisTFsR%2BWOimlOQeDKp4HFR3sJXEestKuiLVqeXmxoqaa9SWAzyZLvH0r5JUN6rnNSm9UExRp8PyErBnwBfHEVo2G%2F9PfiXtWn2R4GkAm%2FPHmoNI5dhtupubDkXxI9br7BwNkH9pWSquGHJuDAVoASmL0moQcpUugV4esefKd18ts8akZJ%2FF9GeAONB4ddDGNMu%2F210tqZBtccy44&amp;checksum=30319539946988b1b8ad12ae4011b4e5140cdaa5677a" title="Click this link to access SONY PLAYSTATION 1 PS1 CONSOLE / Tested Working &amp; Controller / 3 FREE GAMES">SONY PLAYSTATION 1 PS1 CONSOLE / Tested Working &amp; Controller / 3 FREE GAMES</a>
</h3>
<h3 class="lvtitle"><a class="vip" href="https://www.ebay.co.uk/itm/Playstation-1-With-Games-Including-Crash/303320348335?hash=item469f4d36af:g:6y0AAOSwE91do1~a" title="Click this link to access Playstation 1 With Games Including Crash">Playstation 1 With Games Including Crash</a>
</h3>
<h3 class="lvtitle"><a class="vip" href="https://www.ebay.co.uk/itm/SONY-PLAYSTATION-1-PS1-CONSOLE-Tested-Working-Controller-3-FREE-GAMES/303195399469?hash=item4697daa52d:g:K4YAAOSwJmVZ3Ly2" title="Click this link to access SONY PLAYSTATION 1 PS1 CONSOLE / Tested Working &amp; Controller / 3 FREE GAMES">SONY PLAYSTATION 1 PS1 CONSOLE / Tested Working &amp; Controller / 3 FREE GAMES</a>
</h3>

The code -

title="Click this link to access SONY PLAYSTATION 1 PS1 CONSOLE / Tested Working &amp; Controller / 3 FREE GAMES">SONY PLAYSTATION 1 PS1 CONSOLE / Tested Working &amp; Controller / 3 FREE GAMES</a>
    </h3>

-appears twice.

But the two times it appears, the href value is different. On eBay this item appears at the top of the list, so I somehow need to rewrite the code so it keeps the first instance but gets rid of the second instance. I don't really know where to begin with solving the problem, I don't know what experiments I can do.

2
Why is this tagged ebay-api? If you were using the API you wouldn't need to scrape.Barmar
If the href is different, then it's two different items that just happen to have similar text.Barmar
I checked eBay to make sure two items did not have the same item name. I ruled out the above already.Ross Symonds
I couldn't even find class="lvtitle" when I viewed the source of that URL. I wonder if eBay is returning something different to BS than it does to a browser.Barmar
Yes, I've experienced similar things. Websites sometimes give different answers depending on the user agent. So browsers get different results from scrapers.Barmar

2 Answers

0
votes

Try below code.Use Zip to use two select items.

from bs4 import BeautifulSoup
import requests
res=requests.get("https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=Playstation+1&_sacat=0&_pgn=1")
soup=BeautifulSoup(res.text,'html.parser')
for item,price in zip(soup.select('.lvtitle>a'),soup.select('.lvprice.prc >span')):
    print(item.text + " : " + price.text.strip())

from bs4 import BeautifulSoup
import requests
res=requests.get("https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=Playstation+1&_sacat=0&_pgn=1")
soup=BeautifulSoup(res.text,'html.parser')
for item,price in zip(soup.select('.lvtitle>a:not(span)'),soup.select('.lvprice.prc >span')):

    print(item.text.replace('New listing','').strip() + " : " + price.text.strip())

Output:

SONY PLAYSTATION 1 PS1 CONSOLE / Tested Working & Controller / 3 FREE GAMES : £28.75
Playstation 1 With Games Including Crash : £20.00
Original Sony Playstation 1 Bundle : £29.99
Sony Playstation 1 PS1 Console Bundle Joblot AV TV Lead : £26.99
Sony Playstation 1 PS1 PSONE Console Bundle  & TV AV Lead TESTED WORKING : £29.99
SONY PLAYSTATION 1 PS1 CONSOLE / Tested Working & Controller / 3 FREE GAMES : £28.75
Sony Playstation 1 PS1 Console Bundle Joblot AV TV Lead : £26.99
Slim PAL PlayStation 1 PSOne Console + Official Controller Bundle ~ Fully Tested : £22.95
Sony Playstation 1 PS1 PSONE Console Bundle  & TV AV Lead TESTED WORKING : £29.99
Sony Playstation PS1 Console Boxed, 2 Controllers, 2 Memory Cards, Original Demo : £44.99
Sony Playstation 1 PS1 Console WITH LEADS TESTED and WORKING PAL : £19.95
Sony playstation 1 original console boxed with instructions : £28.00
SONY Playstation 1 PS1 PSONE Games Console Bundle With 14 Games DRIVER - W71 : £20.62
Sony Playstation 1 Grey Console / Model SCPH-1002 / Tested & Working : £7.50
Sony Playstation PS1 Console Boxed, 2 Controllers, 2 Memory Cards, Original Demo : £44.99
Sony Playstation PS One PS1 White Console + 2 Controllers & Memory Card - TESTED : £24.75
PlayStation 1 - Original Grey Console : £29.99
SONY PS1 PlayStation One Console Controller Retro Gaming Classic Boxed VGC : £39.99
SONY PlayStation 1 PS1 Home Games Console Bundle With 9 Games TEKKEN 3 - D37 : £28.00
PLAYSTATION 1 CONSOLE 2 CONTROLLERS 27 GAMES GOOD CONDITION PAL SCPH-55522 : £5.00
PlayStation 1 Console With Leads And Offical Controller Ps1 : £20.00
SONY Playstation 1 with 13 games, 2 controllers, official SONY RFU & RCA cables : £29.99
Sony PlayStation 1 Console Bundle with PS1 Games (Boxed) PAL : £19.99
Sony PlayStation 1 PS1 Grey Console Inc Official Controllers & 10 Game Bundle : £39.95
Playstation One PS1 Boxed in EXCELLENT Condition Bundle with 10 loose Games : £49.99
Sony Playstation 1 Console With Controller And 8 Games : £32.00
Sony PlayStation 1 - PS1 Console - Controller & 2 Memory Cards : £32.99
Sony Playstation 1 SCPH-5502 Grey Console. Orginal playstation with controller : £7.00
Sony Playstation 1 + Leads, Memory Cartridge, Controller & Games : £20.99
Sony Playstation 1 PS1 PS One Original Games Console : £18.00
Sony Playstation 1 One Console, Fully Boxed Complete With Polystyrene And Covers : £34.99
Playstation 1 Console With 11 Games, 2 Controllers, PSU #147 : £26.00
Sony Playstation 1 SCPH-5502 Grey Console + 8 games : £19.30
SONY PLAYSTATION 1 BUNDLE CONSOLE 2 CONTROLLERS MEMORY CARD AND 10 GAMES : £30.00
SONY PLAYSTATION 1 BUNDLE CONSOLE 2 CONTROLLERS MEMORY CARD AND 10 GAMES : £23.00
Sony Playstation PSone Slim 1 2 Controllers With Box : £10.50
20 game PlayStation 1 bundle : £29.95
SONY Playstation 1 PS1 Console Bundle With 9 Games, 2 Controllers, BOXED - C63 : £47.00
128G PS1 MINI True Blue Mini Crackhead Pack For Playstation Built-in 7000 Games : £23.00
Sony Playstation 1 PSone Slim With Crash Bandicoot 3 & DualShock Controller VGC : £20.78
Playstation 1 dual shock boxed with cab no controller. No games great condition : £34.99
Sony PlayStation 1 Dual Shock Bundle Grey Console, 2 controllers, 2 memory cards : £14.40
REGION FREE PLAYSTATION 1 SLIM PSOne 8 WIRE STEALTH CHIPPED / MODDED PS1 CONSOLE : £42.00
Sony Playstation 1 Grey Console+all leads and 2 controller's : £15.00
SONY PLAYSTATION 1 CHIPPED SCPH-5552 - FULLY WORKING PS1 MODDED CONSOLE ONLY : £22.00
Sony PlayStation - Working PS1 Console - Parts Only : £35.00
Sony PlayStation 1 Dual Shock Bundle Grey Console scph 5552 : £18.00
Sony Playstation 1 PS1 PSONE Console & Controller with leads and game : £34.00
Sony PlayStation 1 PS1 Console Bundle - 23 Games -  Controller - All Cables : £2.20
Sony PlayStation 1 PS1 Grey Console Inc Official Controllers & 10 Game Bundle : £35.00
SONY Playstation 1 PS1 PSone Home Games Console Bundle With Controller - B98 : £24.99
Sony PlayStation 1 PS1 Original Console + 3 Games Rayman Crash Banicoot Oddworld : £49.95
SONY PLAYSTATION 1 SCPH-1002 AUDIOPHILE CONSOLE + GAMES : £39.95
Sony Playstation 1 SCPH-102 Console - Grey : £9.99
Sony PS1 PLAYSTATION 1 SCPH-1002 Console : £32.00
Original Sony Playstation 1 PS1 Games Console Leads Controller Bundle Boxed # : £28.99
Playstation 1 Console + 3 Games : £23.50
Sony Playstation 1 Slim Console Bundle- SCPH-102 -Inc Cables & Controller - 6 : £19.99
Playstation 1 Ps1 Psone And 22 Games : £44.95
Sony Playstation 1 Console With Sony Controller, AV Cable and Power Cable PAL : £9.99
Ps1 Console + Memory Card / Choose Slim / Phat / Audiophile - Complete Setup : £24.99
0
votes

I'm not sure you can over time and certainly not in any reliable way. I think there are anti-scraping measures in place. You can, for example, for a short period of time, add the User-Agent and Referrer headers and provided you then run code to remove the sponsored links you get the info as shown on page:

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36',
    'Referer': 'https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=Playstation+1&_sacat=0&_pgn=1',
}

r= requests.get('https://www.ebay.co.uk/sch/i.html?_from=R40&_nkw=Playstation+1&_sacat=0&_pgn=1', headers=headers)
soup = bs(r.text,'lxml')

for item in soup.select('.lvresult:has(.promoted-lv)'):
    item.decompose()

for title, price in zip(soup.select('.lvtitle') , soup.select('.lvprice .bold')):
    print(title.text, price.text) 

However, this is quickly detected and ceases to work without a sufficient pause (I haven't timed this but could be in the order of minutes) before trying to request again.

Note: The measures also detect if you are using an automated browser e.g. it strips out the additional randomized attribute you can see for the SPONSORED links (which are also missing the :before).

enter image description here

enter image description here