I'm trying to scrape product information from a webpage, using scrapy. My to-be-scraped webpage looks like this:
- starts with a product_list page with 10 products
- a click on "next" button loads the next 10 products (url doesn't change between the two pages)
- i use LinkExtractor to follow each product link into the product page, and get all the information I need
I tried to replicate the next-button-ajax-call but can't get working, so I'm giving selenium a try. I can run selenium's webdriver in a separate script, but I don't know how to integrate with scrapy. Where shall I put the selenium part in my scrapy spider?
My spider is pretty standard, like the following:
class ProductSpider(CrawlSpider):
name = "product_spider"
allowed_domains = ['example.com']
start_urls = ['http://example.com/shanghai']
rules = [
Rule(SgmlLinkExtractor(restrict_xpaths='//div[@id="productList"]//dl[@class="t2"]//dt'), callback='parse_product'),
]
def parse_product(self, response):
self.log("parsing product %s" %response.url, level=INFO)
hxs = HtmlXPathSelector(response)
# actual data follows
Any idea is appreciated. Thank you!