I am trying to extract the attribute value Body
from row
element in pi.xml.
cat pi.xml
<?xml version="1.0" encoding="utf-8"?>
<posts>
<row Id="19" Body=" The value of π, the value of pi." />
</posts>
The python file, pi.py :
from lxml import etree
doc = etree.parse('pi.xml')
r = doc.findall('row')
for i in r:
print (i.get('Body'))
And the locale:
$ locale:
LANG=en_IN
LANGUAGE=en_IN:en
LC_CTYPE="en_IN"
LC_NUMERIC="en_IN"
LC_TIME="en_IN"
LC_COLLATE="en_IN"
LC_ALL=
Upon running pi.py
as as python pi.py
, everything is fine.
But, if I try to redirect the output and run pi.py as python pi.py >> pi.txt
- I get an error message - UnicodeEncodeError: 'ascii' codec can't encode character u'\u03c0' in position 101: ordinal not in range(128)
If I change print (i.get('Body'))
to print (i.get('Body')).encode('utf-8')
, then python pi.py >> pi.txt
works fine. But, is this the proper way to do it?
Operating System - Ubuntu.
$ PYTHONIOENCODING=utf8 python pi.py >> py.txt
. – Mark Tolonen$ PYTHONIOENCODING=utf8 python somefile.py >> somefile.txt
to other files it didn't work there (same UnicodeEncodeError is thrown) . I'll try finding the solution, if i get one I'll post here. – abTx.decode('utf-8')
upon reading 'x' from a utf-8 encoded file and thenprint processed_x.encode('utf-8')
to save output to another file? Also, this always works and never gives any error. Looking for your suggestion. – abTprint processed_x.encode('utf-8')
works if the console is configured for UTF-8, but it wouldn't work on a console configured foriso-8859-1
. Justprint processed_x
will automatically encode for UTF-8 if the console is configured for UTF-8. Redirection is a shell function, so leave specifying the encoding to the shell also withPYTHONIOENCODING=utf8 python pi.py >> py.txt
. It also leaves the option open to use other encodings without modifying the script. – Mark Tolonen