UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 23: invalid continuation byte

Question

I can't get away with this error. I keep getting "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 23: invalid continuation byte" when reading csv with pandas.

I have already tried everything I have seen online. I have already converted the csv file to many encodings and I still can't make this error disappear. I have already converted the file to UTF-8 with sublimetext and notepad.

import tensorflow as tf
import pandas as pd

csv_path="C:\\Users\\diogo\\Transferências\\E0.csv"
dataset=pd.read_csv(csv_path,encoding="utf-8")

I expected to read the dataset correctly but i always shows this error. Also when I, change the encoding of the pandas reader I still get the error "'utf-8' codec can't decode. Is this supposed to happen? Shouldn't the error change to another error when I change the 'utf-8' encoding? If you know of any alternative ways of reading csv's to tensorflow, the information would also be much appreciated. Thanks.

Diogo Diogo · Accepted Answer · 2019-05-02T22:48:08

I finnaly discovered the encoding was "cp1252" with the following code:

with open('food.csv') as f:
    print(f)

Still don't know why the encoding didn't change to 'utf-8' when I saved the file with sublime text and notepad.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 23: invalid continuation byte

3 Answers