2
votes

There are 2 csv files in same location: 1- candidates.csv 2- Store.csv

When I'm importing candidates.csv filw while using this code, it is getting imported:

data=pandas.read_csv("C:\\Users\\Nupur\\Desktop\\Ankit\\candidates.csv")

But when I'm using same code for importing Store.csv file, it is giving error:

data=pandas.read_csv("C:\\Users\\Nupur\\Desktop\\Ankit\\Store.csv")

Error:

UnicodeDecodeError Traceback (most recent call last) pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas_libs\parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 9: invalid start byte

During handling of the above exception, another exception occurred:

UnicodeDecodeError Traceback (most recent call last) in ----> 1 data=pandas.read_csv("C:\Users\Nupur\Desktop\Ankit\Store.csv")

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision) 676 skip_blank_lines=skip_blank_lines) 677 --> 678 return _read(filepath_or_buffer, kwds) 679 680 parser_f.name = name

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds) 444 445 try: --> 446 data = parser.read(nrows) 447 finally: 448 parser.close()

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows) 1034 raise ValueError('skipfooter not supported for iteration') 1035 -> 1036 ret = self._engine.read(nrows) 1037 1038 # May alter columns / col_dict

C:\ProgramData\Anaconda3\lib\site-packages\pandas\io\parsers.py in read(self, nrows) 1846 def read(self, nrows=None): 1847
try: -> 1848 data = self._reader.read(nrows) 1849 except StopIteration: 1850 if self._first_chunk:

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_with_dtype()

pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._string_convert()

pandas_libs\parsers.pyx in pandas._libs.parsers._string_box_utf8()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 9: invalid start byte

3
Ankit, did you tried encoding='utf-8' ?Karn Kumar

3 Answers

1
votes

Try using this,

data=pandas.read_csv("C:\\Users\\Nupur\\Desktop\\Ankit\\Store.csv",encoding = "ISO-8859-1")
1
votes

If you face an encoding error due to encoding on your file not being the default as mentioned by the pd.read_csv() docs , you can find the encoding of the file by first installing chardet followed by the below code:

import chardet    
rawdata = open('D:\\path\\file.csv', 'rb').read()
result = chardet.detect(rawdata)
charenc = result['encoding']
print(charenc)

This will give you the encoding of the file.

Once you have the encoding, you can read as :

pd.read_csv('D:\\path\\file.csv',encoding = 'encoding you found')

or

pd.read_csv(r'D:\path\file.csv',encoding = 'encoding you found')

You will get the list of all encoding here

Hope you find this useful.

0
votes

Did you tried:

data=pandas.read_csv("C:\\Users\\Nupur\\Desktop\\Ankit\\Store.csv", encoding='utf-8')

If above doesn't work then seems you coding format is different, I would suggest to choose few encoding for Windows such as encoding='iso-8859-1', encoding='cp1252' or encoding='latin1' .

OR try adding r in front of the filename, so that it will be considered a "raw string" so that backslashes won't be treated specially:

data=pandas.read_csv(r"C:\\Users\\Nupur\\Desktop\\Ankit\\Store.csv", encoding='cp1252')