How can I crawl a website using scrapy?

Question

I'm going to write a Gui application, based on scrapy, where the user enters a website URL, and clicks the "crawl" button, then the whole website will be crawled and stored in the built-in scrapy-db (sqlite).

How can I use scrapy to help me crawl the website?

zenCoder zenCoder · Accepted Answer · 2013-12-06T06:02:26

Well, your question is not well-framed. How you can use Scrapy is up to you.

Here's what Scrapy does basically:

1) Websites have a tree structure a->b, a->c, a->d, b->e, c->f .....etc

2) Scrapy helps you crawl through the tree recursively

3) While crawling, Scrapy lets you 'mine' for information. For that you need to learn XPaths to locate and parse the DOM values in the page

http://www.w3schools.com/xpath/‎

4) Parse the values and store it in your database.

Let us know exactly what you are crawling for. If you're just crawling and saving the web pages, you might as well go for softwares like [HTTrack] http://www.httrack.com

How can I crawl a website using scrapy?

1 Answers