2
votes

I can't get away with this error. I keep getting "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 23: invalid continuation byte" when reading csv with pandas.

I have already tried everything I have seen online. I have already converted the csv file to many encodings and I still can't make this error disappear. I have already converted the file to UTF-8 with sublimetext and notepad.

import tensorflow as tf
import pandas as pd

csv_path="C:\\Users\\diogo\\Transferências\\E0.csv"
dataset=pd.read_csv(csv_path,encoding="utf-8")

I expected to read the dataset correctly but i always shows this error. Also when I, change the encoding of the pandas reader I still get the error "'utf-8' codec can't decode. Is this supposed to happen? Shouldn't the error change to another error when I change the 'utf-8' encoding? If you know of any alternative ways of reading csv's to tensorflow, the information would also be much appreciated. Thanks.

3

3 Answers

2
votes

I finnaly discovered the encoding was "cp1252" with the following code:

with open('food.csv') as f:
    print(f)

Still don't know why the encoding didn't change to 'utf-8' when I saved the file with sublime text and notepad.

1
votes

This does not require any module imports, but you can re-open with the steps you specified in the question.

with open('some_file.csv') as file:
    print(file.read()) # should return a (probably long) string
    print(file.decode('utf-8')) # remove the 'b' in the b'string'
0
votes

Try using

open(filepath_, 'rb')

instead of

open(filepath_)

this worked for me on Python 3.8.5