1
votes

I am working on a project which needs finance data, I need to scrape historical data from yahoo finance,but for example https://finance.yahoo.com/quote/ETH-USD/history?p=ETH-USD in that page, I need to adjust time interval and press download button, how can I do it with python ? I should automate this task.

Sorry for my grammatical mistakes,my native language is not English.

3
you don't need to scrape to get data from yahoo finance, see here: query1.finance.yahoo.com/v8/finance/chart/… also check out this python package: pypi.org/project/yahoofinancials it uses scraping which is a bit slower than using the api. I also have an npm package that gets yahoo data from that first link, I'm working on porting it to python as well: npmjs.com/package/yf-apibherbruck

3 Answers

1
votes

In order for you to extract the data from yahoo finance, you can use a python library called yfinance

In your case, by using this library you would do this:

import yfinance as yf

tickers = yf.Tickers('ETH')

eth_history = tickers.tickers.ETH.history(period="1y")

And then you would do whatever you want with this data (save in a spreadsheet for example).

0
votes

You can use a library that implements the Chrome DevTools Protocol (CDP) to automate the Chrome browser or a headless Chromium browser (or any browser supporting this protocol).

Here is one library I found by searching: https://github.com/hyperiongray/trio-chrome-devtools-protocol, but I'm sure there are others too. I have not used it personally.

0
votes

You could use a Selenium WebDriver to load the page, WebElement containing the download button and click() it but that would be a slow and brittle solution compared to calling the API directly.

My approach to this problem would be to reverse engineer the Yahoo Finance URL and fetch the data with the Requests library. The result is a CSV with the historical data that you're looking for.

If you look at the download URL... the URL query parameters are fairly intuitive to understand.

https://query1.finance.yahoo.com/v7/finance/download/ETH-USD?period1=1581795382&period2=1613417782&interval=1d&events=history&includeAdjustedClose=true

We can see that the key components to modify are the stock ticker, date range, and interval. In code...

import csv
from datetime import datetime, timedelta
from io import StringIO

import requests


ticker = 'ETH-USD'
url = f'https://query1.finance.yahoo.com/v7/finance/download/{ticker}'
now = datetime.now()
start_ts = int((now - timedelta(days=365)).timestamp())
end_ts = int(now.timestamp())
params = {
    'period1': start_ts,
    'period2': end_ts,
    'interval': '1d',
    'events': 'history',
    'includeAdjustedClose': True,
}

result = requests.get(url, params=params)

f = StringIO(result.content.decode('utf-8'))
reader = csv.reader(f, delimiter=',')
for row in reader:
    print('\t'.join(row))