3
votes

I've recently learned about web scraping and wanted to create a program that scraped daily product prices. I'm using requests and bs4 in python to scrape target.com. So far this is my code:

TIMES = [2, 3, 4, 5, 6, 7]

url = 'https://www.target.com/p/dyson-ball-animal-2-upright-vacuum-iron-purple/-/A-52190951'
sleep(choice(TIMES))
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')

sleep(choice(TIMES))
name = soup.find('h1').get_text().strip().replace(',', ';')
print('Product name: ', name)

sleep(choice(TIMES))
current_price = soup.find('span', {'data-test': 'product-savings'})
print('Current price: ', current_price)

When I run my code, the product name is correct, but the current price is always "None". Is there a different way I should be searching for the product price?

Thanks in advance!

2

2 Answers

4
votes

As long as you have the item/product ID, you can create a session to get the local store id, api key, and then get that from the API:

import pandas as pd
import requests

s = requests.session()
s.get('https://www.target.com')

key = s.cookies['visitorId']
location = s.cookies['GuestLocation'].split('|')[0]

store_id = requests.get('https://redsky.target.com/v3/stores/nearby/%s?key=%s&limit=1&within=100&unit=mile' %(location, key)).json()
store_id = store_id[0]['locations'][0]['location_id']

product_id = '52190951'
url = 'https://redsky.target.com/web/pdp_location/v1/tcin/%s' %product_id
payload = {
'pricing_store_id': store_id,
'key': key}


jsonData = requests.get(url, params=payload).json()
df = pd.DataFrame(jsonData['price'], index=[0])

Output:

print (df.to_string())
       tcin  location_id  reg_retail  current_retail current_retail_start_timestamp current_retail_end_timestamp  default_price formatted_current_price formatted_current_price_type  is_current_price_range
0  52190951         3991      499.99          499.99           2019-10-19T07:00:00Z         9999-12-31T00:00:00Z          False                 $499.99                          reg                   False
0
votes

You do not want to scrape the html, you want to scrape either the emebedded microdata or the embedded 'ld+json' data. One of them contains the productid. Once you have that value plug it into 'redsky.target.com' api....see the productid value in the url below?

https://redsky.target.com/v2/pdp/tcin/52190951?excludes=taxonomy,promotion,bulk_ship,rating_and_review_reviews,rating_and_review_statistics,question_answer_statistics

… then parse the returned json to get the price.

This might help.