1
votes

I am attempting to consistently download historical stock data from a link which appears on the page after you hover the cursor over it. At present I have the following code which doesn't appear to find the css_selector, nor download the .csv file.

#!/usr/bin/env python3.6

## Import Libraries
import os, sys
import time

from selenium import webdriver
import selenium.webdriver.firefox.options
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC 

## Declare Variables
ticker = 'CAT'
period1 = '1262332800'
period2 = '1537945200'
download_path = os.getcwd()
css_selector = "a.Fl\(end\):nth-child(1)"

## Configure Firefox Options
profile = webdriver.FirefoxProfile()
profile.set_preference("browser.download.folderList", 2) # 0 means to download to the desktop, 1 means to download to the default "Downloads" directory, 2 means to use the directory 
profile.set_preference("browser.download.dir", download_path)
profile.set_preference("browser.download.manager.showWhenStarting", False)
profile.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/x-gzip/text/csv")

## Firefox driver loads historical data page
driver = webdriver.Firefox(firefox_profile=profile)
driver.get("https://finance.yahoo.com/quote/{}/history?period1={}&period2={}&interval=1d&filter=history&frequency=1d"
           .format(ticker, period1, period2))

## Click on 'Download Data' Link
try:
    input_element = driver.find_element_by_css_selector(css_selector).click()
    print('Success!')

except:
    print('Failed!!!!!')

finally:
    driver.quit()
    print('Kill Driver!')

The example site is: https://finance.yahoo.com/quote/CAT/history?period1=1262332800&period2=1538118000&interval=1d&filter=history&frequency=1d

css_selector, "a.Fl(end):nth-child(1)", is found in this section of HTML:

<svg class="Va(m)! Mend(5px) Stk($c-fuji-blue-1-b)! Fill($c-fuji-blue-1-b)! Cur(p)" width="15" height="15" viewBox="0 0 48 48" data-icon="download" style="fill: rgb(0, 129, 242); stroke: rgb(0, 129, 242); stroke-width: 0; vertical-align: bottom;"><path d="M43.002 43.002h-38c-1.106 0-2.002-.896-2.002-2v-11c0-1.105.896-2 2.002-2 1.103 0 1.998.895 1.998 2v9h34.002v-9c0-1.105.896-2 2-2s2 .895 2 2v11c0 1.103-.896 2-2 2m-19-8L11.57 23.307c-.75-.748-.75-1.965 0-2.715.75-.75 1.965-.75 2.715 0l7.717 7.716V2h4v26.308l7.717-7.716c.75-.75 1.964-.75 2.714 0s.75 1.967 0 2.715L24.002 35.002z"></path></svg><span>Download Data</span>

My questions are:

  • Is there an easier way to click on the link? xpath? partial_link?
  • Am I attempting to click on the correct css_selector?
  • Do I need to hover over the text in order to click on the download data link?
  • How do I find the element while the site is loading? The site never finishes downloading there are continuous calls to ad servers.

Using the method .find_element_by_link_text() results in TimeoutException:

TimeoutException Traceback (most recent call last) in ()
21 ## Go to Homepage for historical data
22 driver.get("https://finance.yahoo.com/quote/{}/history?period1={}&period2={}&interval=1d&filter=history&frequency=1d"

---> 23 .format(ticker, period1, period2) )
24
25 print('.get() Complete!')

~/virtualenvs/demo/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in get(self, url)
331 Loads a web page in the current browser session.
332 """
--> 333 self.execute(Command.GET, {'url': url})
334
335 @property ~/virtualenvs/demo/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py in execute(self, driver_command, params)
319 response = self.command_executor.execute(driver_command, params)
320 if response:
--> 321 self.error_handler.check_response(response)
322 response['value'] = self._unwrap_value(
323 response.get('value', None))

~/virtualenvs/demo/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py in check_response(self, response)
240 alert_text = value['alert'].get('text')
241 raise exception_class(message, screen, stacktrace, alert_text)
--> 242 raise exception_class(message, screen, stacktrace)
243
244 def _value_or_default(self, obj, key, default):

TimeoutException: Message: Timeout loading page after 300000ms

My interpretation of this is that the site does not finish loading, so the try/except/finally logic never executes.

2
Actually css_selector = "a.Fl\(end\):nth-child(1)" selector is correct. Can you share exception or current output/behavior description? - Andersson
Thank you for the responses, I am attempting to run both. Presently, @dbachhav 's solution link_text has worked once, but doesn't appear to work a second or third time. The site continues to load ads, I'm wondering if this impedes the progress of the driver. - MyopicVisage
Do you mean that same page behaves differently when you're trying to access it? Or you need to handle many pages and search by link text is not applicable to all of them? - Andersson
Note that you're getting TimeoutException not because of using find_by_link_text(), but because of using driver.set_page_load_timeout(3). It means that you should get TimeoutException in case page not loaded within 3 seconds. Do you really need that? Comment out that line and check again - Andersson
have no idea why you removed driver.set_page_load_timeout(3) line from exception log, but TimeoutException: Message: Timeout loading page after 300000ms tells it all: WebDriver failed to load page, but not failed to find element! - Andersson

2 Answers

3
votes
  • Is there an easier way to click on the link?

    selecting by link text should work fine:

    driver.find_element_by_link_text('Download Data').click()
    
  • Am I attempting to click on the correct css_selector?

    yes, selector seem to be correct

  • Do I need to hover over the text in order to click on the download data link?

    no, you don't need to hover over link

Update

If you need to stop page loading, try below solution:

from selenium.common.exceptions import TimeoutException

driver.set_page_load_timeout(10)
try:
    driver.get("https://finance.yahoo.com/quote/{}/history?period1={}&period2={}&interval=1d&filter=history&frequency=1d"
           .format(ticker, period1, period2))
except TimeoutException:
    driver.execute_script("window.stop();")
driver.find_element_by_link_text('Download Data').click()

Page loading will be forcibly stopped if not loaded within 10 seconds

0
votes

Can you please try below options

 1. download = driver.find_element_by_xpath(".//*[@id='Col1-1-HistoricalDataTable-Proxy']/section/div[1]/div[2]/span[2]/a")
    download.click()
 2. download = driver.find_element_by_link_text('Download Data')
    download.click()
 3. download = driver.find_element_by_partial_link_text('Download')
    download.click()