1
votes

I have got a bunch of csv files that I am loading in Pandas just fine, but one file is acting up I'm opening it this way :

df = pd.DataFrame.from_csv(csv_file)

error:

File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 1268, in from_csv encoding=encoding,tupleize_cols=False) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 400, in parser_f return _read(filepath_or_buffer, kwds) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 198, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 479, in init self._make_engine(self.engine) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 586, in _make_engine self._engine = CParserWrapper(self.f, **self.options) File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/io/parsers.py", line 957, in init self._reader = _parser.TextReader(src, **kwds) File "parser.pyx", line 477, in pandas.parser.TextReader.cinit (pandas/parser.c:4434) File "parser.pyx", line 599, in pandas.parser.TextReader._get_header (pandas/parser.c:5831) pandas.parser.CParserError: Passed header=0 but only 0 lines in file

To me, this means that there is some sort of corruption in the file, having a quick look is seems fine, it is a big file though and visually checking every single line is not an option, what would be a good strategy to troubleshoot a csv file that pandas won't open ?

thank you

3

3 Answers

0
votes

Looks like pandas assigns line 0 as the header. Try calling:

df = pd.DataFrame.from_csv(csv_file,header=None)

or

    df = pd.DataFrame.read_csv(csv_file,header=None)

However, it's strange that the files seems to have zero lines (i.e. it's empty). Maybe the filepath is wrong?

0
votes

if in Linux open it with head in the operating system to inspect it then fix it with awk or sed.. if in windows, you could also try vim to inspect and fix it. In short it probably is not best to fix the file in Pandas. You most likely have odd line endings (since the error message says 0 lines) so heading the file or cat or using Vim is needed to determine the line endings so that you can decide how best to fix or handle.

0
votes

I encountered the issue like you:


/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.13.1_601_g4663353-py2.7-macosx-10.9-x86_64.egg/pandas/io/parsers.pyc in init(self, src, **kwds) 970 kwds['allow_leading_cols'] = self.index_col is not False 971 --> 972 self._reader = _parser.TextReader(src, **kwds) 973 974 # XXX

/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.13.1_601_g4663353-py2.7-macosx-10.9-x86_64.egg/pandas/parser.so in pandas.parser.TextReader.cinit (pandas/parser.c:4628)()

/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas-0.13.1_601_g4663353-py2.7-macosx-10.9-x86_64.egg/pandas/parser.so in pandas.parser.TextReader._get_header (pandas/parser.c:6068)()

CParserError: Passed header=0 but only 0 lines in file


My code is:

df = pd.read_csv('/Users/steven/Documents/Mywork/Python/sklearn/beer/data')

Finally, I found I have made a mistake: I sent a path of directory instead of file to read_csv.

The correct code is:

df = pd.read_csv('/Users/steven/Documents/Mywork/Python/sklearn/beer/data/beer_reviews.csv')

It runs right.

So, I think the reason of your issue lies in the file you sent. Maybe it is path of directory just as I have done. Maybe the file is empty or corrupt, or in wrong encoding set.

I hope the above is helpful to you.