I am reading a csv file, this come for some traces that are from a network protocol, hexa characters and normal mixed. I am trying to read a .csv, and I have tried several encodings: utf-8, cp1252, latin1...
For latin1:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 51: ordinal not in range(128)
For utf-8:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 51: invalid start byte
For cp1252:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 51: ordinal not in range(128)
The code used is:
df=pd.read_csv(file,sep='`',error_bad_lines=False,encoding='cp1252',names=colnames,quotechar='"')
I am no big expert in encoding(s), but I would like to know how to solve it.
Find out the current encoding of the csv file i am reading?
Is there a very permissive codec that takes pretty much everything?
Thanks.
UnicodeDecodeErrorinread_csvbecause it contains a non utf-8 character. For Latin1 and cp1252 you get aUnicodeEncodeError(note Encode instead of Decode) probably in a different instruction. I need the full stacktraces and the relevant code to be able to help you. - Serge Ballestacp1252you shouldn't get error'ascii' codec can't encode character u'\xb0'. You can get it only withascii. Usingb'\xb0'.decode('cp1252')I see it can be°(degree sign) - furas