I just thought I'd mention something here which I had to spent a long time experimenting with before I finally realised what was going on. This may be so obvious to everyone here that they haven't bothered mentioning it. But it would've helped me if they had, so on that principle...!
NB: I am using Jython specifically, v 2.7, so just possibly this may not apply to CPython...
NB2: the first two lines of my .py file here are:
# -*- coding: utf-8 -*-
from __future__ import print_function
The "%" (AKA "interpolation operator") string construction mechanism causes ADDITIONAL problems too... If the default encoding of the "environment" is ASCII and you try to do something like
print( "bonjour, %s" % "fréd" ) # Call this "print A"
You will have no difficulty running in Eclipse... In a Windows CLI (DOS window) you will find that the encoding is code page 850 (my Windows 7 OS) or something similar, which can handle European accented characters at least, so it'll work.
print( u"bonjour, %s" % "fréd" ) # Call this "print B"
will also work.
If, OTOH, you direct to a file from the CLI, the stdout encoding will be None, which will default to ASCII (on my OS anyway), which will not be able to handle either of the above prints... (dreaded encoding error).
So then you might think of redirecting your stdout by using
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
and try running in the CLI piping to a file... Very oddly, print A above will work... But print B above will throw the encoding error! The following will however work OK:
print( u"bonjour, " + "fréd" ) # Call this "print C"
The conclusion I have come to (provisionally) is that if a string which is specified to be a Unicode string using the "u" prefix is submitted to the %-handling mechanism it appears to involve the use of the default environment encoding, regardless of whether you have set stdout to redirect!
How people deal with this is a matter of choice. I would welcome a Unicode expert to say why this happens, whether I've got it wrong in some way, what the preferred solution to this, whether it also applies to CPython, whether it happens in Python 3, etc., etc.
chcp 65001
before executing your script. This can have issues, but it often helps, and doesn't require a lot of typing (less thanset PYTHONIOENCODING=utf_8
). – Tomasz Gandorsetx PYTHONENCODING utf-8
to make it permanent if you want to save typing. – ejm