- Go to this url https://www.horseracebase.com/horse-racing-results.php?year=2005&month=3&day=15 (username = TrickyBen | password = TrickyBen123)
- Notice that there is a Download Excel button (in Red)
- I want to download the excel file and turn it into a pandas dataframe. I want to do it programatically (ie. from the script, not by manually clicking around the website). How would I do this?
This code will get you logged in as TrickyBen, and make a request to the website API...
import requests from lxml import html from requests import Session import pandas as pd import shutil
raceSession = Session()
LoginDetails = {'login': 'TrickyBen', 'password': 'TrickyBen123'}
LoginUrl = 'https://www.horseracebase.com/horse-racing-results.php?year=2005&month=3&day=15/horsebase1.php'
LoginPost = raceSession.post(LoginUrl, data=LoginDetails)
RaceUrl = 'https://www.horseracebase.com/excelresults.php'
RaceDataDetails = {"user": "41495", "racedate": "2005-3-15", "downloadbutton": "Excel"}
PostHeaders = {"Content-Type": "application/x-www-form-urlencoded"}
Response = raceSession.post(RaceUrl, data=RaceDataDetails, headers=PostHeaders)
Table = pd.read_table(Response.text)
Table.to_csv('blahblah.csv')
If you inspect element, you'll notice that the relevant element looks like this...
<form action="excelresults.php" method="post">
<input type="hidden" name="user" value="41495">
<input type="hidden" name="racedate" value="2005-3-15">
<input type="submit" class="downloadbutton" value="Excel">
</form>
I get this error message...
Traceback (most recent call last):
File "/Users/Alex/Desktop/DateTest/hrpull.py", line 20, in <module>
Table = pd.read_table(Response.text)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 562, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 315, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 645, in __init__
self._make_engine(self.engine)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 799, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 1213, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 358, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:3427)
File "pandas/parser.pyx", line 628, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:6861)
IOError: File race_date race_time track race_name race_restrictions_age race_class major race_distance prize_money going_description number_of_runners place distbt horse_name stall trainer horse_age jockey_name jockeys_claim pounds odds fav official_rating comptime TotalDstBt MedianOR Dist_Furlongs placing_numerical RCode BFSP BFSP_Place PlcsPaid BFPlcsPaid Yards RailMove RaceType
"2005-03-15" "14:00:00" "Cheltenham" "Letheby & Christopher Supreme Novices Hurdle " "4yo+" "Class 1" "Grade 1" "2m˝f " "58000" "Good" "20" "1st" "Arcalis" "0" "Johnson, J Howard" "5" "Lee, G" "0" "161" "21" "136" "3 mins 53.00s" "121.5" "16.5" "1" "National Hunt" "0" "0" "3" "0" "0" "0" "Novices Hurdle"
"2005-03-15" "14:00:00" "Cheltenham" "Letheby & Christopher Supreme Novices Hurdle " "4yo+" "Class 1" "Grade 1" "2m˝f " "58000" "Good" "20" "2nd" "6" "Wild Passion (GER)" "0" "Meade, Noel" "5" "Carberry, P" "0" "161" "11" "0" "3 mins 53.00s" "6" "121.5" "16.5" "2" "National Hunt" "0" "0" "3" "0" "0" "0" "Novices Hurdle"