4
votes

When I use open and read syntax to open and read file in Python 3 and change files encoding, but this error happened. I want to convert a text with any encoding to UTF-8 and save it.

"sin3" has an unknown encoding,

fh= open(sin3, mode="r", encoding='utf8')
ss= fh.read()

File "/usr/lib/python3.2/codecs.py", line 300, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 34: invalid continuation byte

I used codecs and got this error:

fh= codecs.open(sin3, mode="r", encoding='utf8')
ss= fh.read()

File "/usr/lib/python3.2/codecs.py", line 679, in read
return self.reader.read(size)
File "/usr/lib/python3.2/codecs.py", line 482, in read
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 34: invalid continuation byte

2
Your file isn't encoded with utf-8, so you can't open it with the utf-8 codec. You'll need to find some way to detect the actual encoding before you open it.Wooble
To put the same thing another way: when you open a file for reading, the encoding parameter needs to be the encoding the file is already in, not the encoding you want (you select that when you write the file).Thomas K
Thanks for u'r support ! @Wooble,Thomas Kalireza

2 Answers

1
votes

Try this:

  • Open the csv file in Sublime text editor.
  • Save the file in utf-8 format.
  • In sublime, Click File -> Save with encoding -> UTF-8

Then, you can read your file as usual:

I would recommend using Pandas.

In Pandas, you can read it by using:

import pandas as pd
data = pd.read_csv('file_name.csv', encoding='utf-8')
1
votes

Try this:

fh = codecs.open(sin3, "r",encoding='utf-8', errors='ignore')