I've seen similar questions posted but they're not exactly the same as what I've encountered. I am using Python 3.7 and Pandas 0.25.0.
Weirdly, if I download this zip file directly from this link, I am able to read it via pd.read_csv
as follows:
pd.read_csv('publicleaderboarddata.zip')
TeamId TeamName SubmissionDate Score
0 688191 Sergey Mushinskiy 2017-05-24 12:20:34 0.06630
1 688203 DeepVoltaire 2017-05-24 12:25:03 0.06630
2 688237 RakeshNikam 2017-05-24 13:02:31 0.06512
......
However, if I do:
this_leaderboard_df = pd.read_csv('https://www.kaggle.com/c/6649/publicleaderboarddata.zip,
compression='zip')
I will get a BadZipFile
error as follows. Why does this happen?
--------------------------------------------------------------------------- BadZipFile Traceback (most recent call last) in ----> 1 this_leaderboard_df = pd.read_csv(this_leaderboard_link, compression='zip') 2 this_leaderboard_df.head(e)
~/.virtualenvs/py3/lib/python3.7/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision) 683 ) 684 --> 685 return _read(filepath_or_buffer, kwds) 686 687 parser_f.name = name
~/.virtualenvs/py3/lib/python3.7/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds) 455 456 # Create the parser. --> 457 parser = TextFileReader(fp_or_buf, **kwds) 458 459 if chunksize or iterator:
~/.virtualenvs/py3/lib/python3.7/site-packages/pandas/io/parsers.py in init(self, f, engine, **kwds) 893 self.options["has_index_names"] = kwds["has_index_names"] 894 --> 895 self._make_engine(self.engine) 896 897 def close(self):
~/.virtualenvs/py3/lib/python3.7/site-packages/pandas/io/parsers.py in _make_engine(self, engine) 1133 def _make_engine(self, engine="c"): 1134 if engine == "c": -> 1135 self._engine = CParserWrapper(self.f, **self.options) 1136 else: 1137 if engine == "python":
~/.virtualenvs/py3/lib/python3.7/site-packages/pandas/io/parsers.py in init(self, src, **kwds) 1915 kwds["usecols"] = self.usecols 1916 -> 1917 self._reader = parsers.TextReader(src, **kwds) 1918 self.unnamed_cols = self._reader.unnamed_cols 1919
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.cinit()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()
/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/zipfile.py in init(self, file, mode, compression, allowZip64, compresslevel) 1223 try: 1224 if mode == 'r': -> 1225 self._RealGetContents() 1226 elif mode in ('w', 'x'): 1227 # set the modified flag so central directory gets written
/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/zipfile.py in _RealGetContents(self) 1290 raise BadZipFile("File is not a zip file") 1291 if not endrec: -> 1292 raise BadZipFile("File is not a zip file") 1293 if self.debug > 1: 1294 print(endrec)
BadZipFile: File is not a zip file
pandas
can't login to this page so it gets HTML pages with login form instead of zip file. – furasSelenium
to control web browser and login to kaggle and click on link to download file. – furas