So there are a lot of questions that have been asked around dynamic content scraping on stackoverflow, and I went through all of these, but all the solutions suggested did not work for the following problem:
Context:
- Using Selenium webdriver with python
- I mostly used this resource: http://selenium-python.readthedocs.org/page-objects.html regarding the Python.org example.
- Page to scrape: http://propertymap.sfplanning.org/
Issue:
I have not been able to access any of the DOM elements on this page. Note if I could get some hints on how to access the search bar, and the search button, that would be a great start. See page to scrape What I want in the end, is to go through a list of addresses, launch the search, and copy the information displayed on the right hand side of the screen.
I have tried the following:
- Changed the browser for webdriver (from Chrome to Firefox)
Added waiting time for the page to load
try: WebDriverWait(self.driver, 10).until(EC.presence_of_element_located((By.ID, "addressInput"))) except: print "address input not found"
- Tried to access the item by ID, XPATH, NAME, TAG NAME, etc., nothing worked.
Questions
- What else could I try that I have not so far (using Selenium webdriver)?
- Are some websites really impossible to scrape? (I don't think that the city used an algorithm to generate any random DOM everytime I re-load the page).
find_by_*
) to locate an element. (the accepted answer chosefind_element_by_id
). Also note, hitting enter to bypass an element lookup and simulated click tends to be faster and more reliable in practice. – Corey Goldberg