How to solve UnicodeDecodeError in Python 3.6?

Question

I am switched from Python 2.7 to Python 3.6.

I have scripts that deal with some non-English content.

I usually run scripts via Cron and also in Terminal.

I had UnicodeDecodeError in my Python 2.7 scripts and I solved by this.

# encoding=utf8  
import sys  

reload(sys)  
sys.setdefaultencoding('utf8')

Now in Python 3.6, it doesnt work. I have print statements like print("Here %s" % (myvar)) and it throws error. I can solve this issue by replacing it to myvar.encode("utf-8") but I don't want to write with each print statement.

I did PYTHONIOENCODING=utf-8 in my terminal and I have still that issue.

Is there a cleaner way to solve UnicodeDecodeError issue in Python 3.6?

is there any way to tell Python3 to print everything in utf-8? just like I did in Python2?

Are the non-English files encoded properly in UTF-8 themselves? — Edward Minnix
@EdwardMinnix I am scraping data from various Hewbrew/Korean sites, so data is not always clean. — Umair Ayub
@usr2564301 is there any way to tell Python3 to print everything in utf-8? just like I did in Python2? — Umair Ayub
Normally your terminal has an encoding defined which is used by Python to set the encoding of its file object (sys.stdout). Can you provide what sys.stdout.encoding is set to on your machine? — Alfe
I think that is the root of the problem. What strange terminal are you using? In Unix-ish environments you can set the env var TERM to something like xterm or similar. Also the LANG variable could have an influence. — Alfe

Alastair McCormack Alastair McCormack · Accepted Answer · 2018-06-25T15:40:56

It sounds like your locale is broken and have another bytes->Unicode issue. The thing you did for Python 2.7 is a hack that only masked the real problem (there's a reason why you have to reload sys to make it work).

To fix your locale, try typing locale from the command line. It should look something like:

LANG=en_GB.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_ALL=

locale depends on LANG being set properly. Python effectively uses locale to work out what encoding to use when writing to stdout in. If it can't work it out, it defaults to ASCII.

You should first attempt to fix your locale. If locale errors, make sure you've installed the correct language pack for your region.

If all else fails, you can always fix Python by setting PYTHONIOENCODING=UTF-8. This should be used as a last resort as you'll be masking problems once again.

If Python is still throwing an error after setting PYTHONIOENCODING then please update your question with the stacktrace. Chances are you've got an implied conversion going on.

How to solve UnicodeDecodeError in Python 3.6?

6 Answers