When scrapy shuts down, it will forget all the urls. I want to give scrapy a set of urls which have been crawled, when it is begin. How could add a rule to crawlspider to let it know which urls have been visited?
current function:
SgmlLinkExtractor(allow=(), deny=(), allow_domains=(), deny_domains=(), restrict_xpaths(), tags=('a', 'area'), attrs=('href'), canonicalize=True, unique=True, process_value=None)
just use parse to tell spider which url to crawl. How could I tell scrapy which urls should not visit?