I recently made a webscraper with python and Selenium, and i found it pretty simple to do. The page used ajax calls to load the data, and initialy i waited a fixed time_out to load the page. That worked for a while. After that, I found that selenium has a built in function, WebDriverWait which can wait for a specific element to load, using wait.until(). This made my webscraper run faster.
The problem is, i still was not satisfied with the results. It took me an average of 1.35seconds per page to download the content.
I tried to paralelize this but the time's did not get better because the creation if the driver instance (with Chrome or PhantomJS) took most of the scraping time.
So I turned myself to scrapy. After doing the tutorials, and having my parser already written, my two questions are:
1) does scrapy automatically run multiple url requests in paralel?
2) how can i set a dynamic time out with scrapy, like the WebDriverWait wait.until() of Selenium
3) if there is no dynamic set out time available for scrapy, and the solution is to use scrapy + selenium, to let selenium wait till the content is loaded, is there really any advantage of using scrapy? I could simlply retrieve the data using selenium selectors, like i was doing before using scrapy
Thank you for you help.