2
votes

In the Scrapy tutorial there is this method of the BaseSpider:

make_requests_from_url(url)

A method that receives a URL and returns a Request object (or a list of Request objects) to scrape.

This method is used to construct the initial requests in the start_requests() method, and is typically used to convert urls to requests.

Unless overridden, this method returns Requests with the parse() method as their callback function, and with dont_filter parameter enabled (see Request class for more info).

Do you understand what this method does? And can I use makerequestsfrom_url and BaseSpider instead of SgmlLinkExtractor and CrawlSpider that is not working for me?

I am trying to crawl more than the given initial url and Scrapy is not doing that.

Thanks

1

1 Answers

5
votes

That's right, the CrawlSpider is useful and convenient in many cases, but it only covers a subset of all possible spiders. If you need something more complex, you typically subclass BaseSpider and implement start_requests() method.