python - "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte

270

votes

Here is my code,

for line in open('u.item'):
# Read each line

Whenever I run this code it gives the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: invalid continuation byte

I tried to solve this and add an extra parameter in open(). The code looks like:

for line in open('u.item', encoding='utf-8'):
# Read each line

But again it gives the same error. What should I do then?

pythonpython-3.xcharacter-encoding

Badly encoded data I would assume. – Andreas Jung

Or just not UTF-8 data. – Mark Tolonen

Possible duplicate of Python 3 UnicodeDecodeError - How do I debug UnicodeDecodeError? – tripleee

We had this error with msgpack when using python 3 instead of python 2.7. For us, the course of action was to work with python 2.7. – Jesse W. Collins

503

votes

As suggested by Mark Ransom, I found the right encoding for that problem. The encoding was "ISO-8859-1", so replacing open("u.item", encoding="utf-8") with open('u.item', encoding = "ISO-8859-1") will solve the problem.

61

votes

The following also worked for me. ISO 8859-1 is going to save a lot, hahaha - mainly if using Speech Recognition APIs.

Example:

file = open('../Resources/' + filename, 'r', encoding="ISO-8859-1");

34

votes

Your file doesn't actually contain UTF-8 encoded data; it contains some other encoding. Figure out what that encoding is and use it in the open call.

In Windows-1252 encoding, for example, the 0xe9 would be the character é.

22

votes

Try this to read using Pandas:

pd.read_csv('u.item', sep='|', names=m_cols, encoding='latin-1')

15

votes

This works:

open('filename', encoding='latin-1')

Or:

open('filename', encoding="ISO-8859-1")

14

votes

If you are using Python 2, the following will be the solution:

import io
for line in io.open("u.item", encoding="ISO-8859-1"):
    # Do something

Because the encoding parameter doesn't work with open(), you will be getting the following error:

TypeError: 'encoding' is an invalid keyword argument for this function

13

votes

You could resolve the problem with:

for line in open(your_file_path, 'rb'):

'rb' is reading the file in binary mode. Read more here.

6

votes

You can try this way:

open('u.item', encoding='utf8', errors='ignore')

2

votes

This is an example for converting a CSV file in Python 3:

try:
    inputReader = csv.reader(open(argv[1], encoding='ISO-8859-1'), delimiter=',',quotechar='"')
except IOError:
    pass

2

votes

Sometimes when using open(filepath) in which filepath actually is not a file would get the same error, so firstly make sure the file you're trying to open exists:

import os
assert os.path.isfile(filepath)

1

votes

Open your file with Notepad++, select "Encoding" or "Encodage" menu to identify or to convert from ANSI to UTF-8 or the ISO 8859-1 code page.

python - "for line in..." results in UnicodeDecodeError: 'utf-8' codec can't decode byte

11 Answers