First: reload(sys)
and setting some random default encoding just regarding the need of an output terminal stream is bad practice. reload
often changes things in sys which have been put in place depending on the environment - e.g. sys.stdin/stdout streams, sys.excepthook, etc.
Solving the encode problem on stdout
The best solution I know for solving the encode problem of print
'ing unicode strings and beyond-ascii str
's (e.g. from literals) on sys.stdout is: to take care of a sys.stdout (file-like object) which is capable and optionally tolerant regarding the needs:
When sys.stdout.encoding
is None
for some reason, or non-existing, or erroneously false or "less" than what the stdout terminal or stream really is capable of, then try to provide a correct .encoding
attribute. At last by replacing sys.stdout & sys.stderr
by a translating file-like object.
When the terminal / stream still cannot encode all occurring unicode chars, and when you don't want to break print
's just because of that, you can introduce an encode-with-replace behavior in the translating file-like object.
Here an example:
#!/usr/bin/env python
# encoding: utf-8
import sys
class SmartStdout:
def __init__(self, encoding=None, org_stdout=None):
if org_stdout is None:
org_stdout = getattr(sys.stdout, 'org_stdout', sys.stdout)
self.org_stdout = org_stdout
self.encoding = encoding or \
getattr(org_stdout, 'encoding', None) or 'utf-8'
def write(self, s):
self.org_stdout.write(s.encode(self.encoding, 'backslashreplace'))
def __getattr__(self, name):
return getattr(self.org_stdout, name)
if __name__ == '__main__':
if sys.stdout.isatty():
sys.stdout = sys.stderr = SmartStdout()
us = u'aouäöüфżß²'
print us
sys.stdout.flush()
Using beyond-ascii plain string literals in Python 2 / 2 + 3 code
The only good reason to change the global default encoding (to UTF-8 only) I think is regarding an application source code decision - and not because of I/O stream encodings issues: For writing beyond-ascii string literals into code without being forced to always use u'string'
style unicode escaping. This can be done rather consistently (despite what anonbadger's article says) by taking care of a Python 2 or Python 2 + 3 source code basis which uses ascii or UTF-8 plain string literals consistently - as far as those strings potentially undergo silent unicode conversion and move between modules or potentially go to stdout. For that, prefer "# encoding: utf-8
" or ascii (no declaration). Change or drop libraries which still rely in a very dumb way fatally on ascii default encoding errors beyond chr #127 (which is rare today).
And do like this at application start (and/or via sitecustomize.py) in addition to the SmartStdout
scheme above - without using reload(sys)
:
...
def set_defaultencoding_globally(encoding='utf-8'):
assert sys.getdefaultencoding() in ('ascii', 'mbcs', encoding)
import imp
_sys_org = imp.load_dynamic('_sys_org', 'sys')
_sys_org.setdefaultencoding(encoding)
if __name__ == '__main__':
sys.stdout = sys.stderr = SmartStdout()
set_defaultencoding_globally('utf-8')
s = 'aouäöüфżß²'
print s
This way string literals and most operations (except character iteration) work comfortable without thinking about unicode conversion as if there would be Python3 only.
File I/O of course always need special care regarding encodings - as it is in Python3.
Note: plains strings then are implicitely converted from utf-8 to unicode in SmartStdout
before being converted to the output stream enconding.
The best solution is to learn to use encode and decode correctly instead of using hacks.
This was certainly possible with python2 at the cost of always remembering to do so / consistently using your own interface. My experience suggests that this becomes highly problematic when you are writing code that you want to work with both python2 and python3. – Att Righ