1
votes

I am able crawl some pages but some pages are taking time to load because DOM is not fully rendered so that I am not able to crawl it. Can anyone have solution for this?

Thanks in advance

3

3 Answers

3
votes

I recommand scrapy splash. It is a rendering service for scrapy. (It is supported by scrapinghub, the guys behind scrapy).

1
votes

You can use a web driver like selenium with a headless browser like PhantomJS or Firefox. Use PhantomJS alone, or one of the plenty other alternatives available : CasperJS, SlimerJS, etc...

1
votes

As alternative to using Selenium you can use Firebug plugin for Firefox or Chrome Developer tools to watch the background requests the AngularJS app is doing in the background and then emulate these requests directly.

While this requires more work, the scraper is much faster as it doesn't have to wait for the page to render.