0
votes

I'm getting this error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2661' in position 1409: ordinal not in range(128)

I'm very green to programming still, so have mercy on me and my ignorance. But I understand the error to be that it's not able to handle unicode characters. There's that at least one unicode char, but there could be countless others that'll perk up in that feed.

I've done some looking for others who've had similar problems, but I can't can't find a solution I understand or can make work.

#import library to do http requests:
import urllib
from xml.dom.minidom import parseString, parse

f = open('games.html', 'w')

document = urllib.urlopen('https://itunes.apple.com/us/rss/topfreemacapps/limit=300/genre=12006/xml')

dom = parse(document)

image = dom.getElementsByTagName('im:image')
title = dom.getElementsByTagName('title')
price = dom.getElementsByTagName('im:price')
address = dom.getElementsByTagName('id')

imglist = []
titlist = []
pricelist = []
addlist = []

i = 0
j = 20
k = 40

f.write('''\
<!DOCTYPE html>
<html>
<head>
<style type="text/css">
<!--


A:link {text-decoration: none; color: #246DA8;}
A:visited {text-decoration: none; color: #246DA8;}
A:active {text-decoration: none; color: #40A9E3;}
A:hover {text-decoration: none; color: #40A9E3;}

.box {
    vertical-align:middle;
    width: 180px;
    height: 120px;
    border: 1px solid #99c;
    padding: 5px;
    margin: 0px;
    margin-left: auto;
    margin-right: auto;
    -moz-border-radius: 5px;
    border-radius: 5px;
    -webkit-border-radius: 5px;
    background-color:#ffffff;
    font-family: Arial, Helvetica, sans-serif; color: black;
    font-size: small;
    font-weight: bold;
}


-->
</style>
</head>

<body>
''')

for i in range(0,len(image)):
    if image[i].getAttribute('height') == '53':
        imglist.append(image[i].firstChild.nodeValue)

for i in range(1,len(title)):
    titlist.append(title[i].firstChild.nodeValue)

for i in range(0,len(price)):
    pricelist.append(price[i].firstChild.nodeValue)

for i in range(1,len(address)):
    addlist.append(address[i].firstChild.nodeValue) 

for i in range(0,20):

            f.write('''
<div style="width: 600px;">
 <div style="float: left; width: 200px;">
    <div class="box" align="center">
        <div align="center">
            <a href="''' + addlist[i] + '''?at=10l5NR"  target="_blank">''' + titlist[i] + '''</a><br>
            <a href="''' + addlist[i] + '''?at=10l5NR" target="_blank"><img src="''' + imglist[i] + '''" alt="" width="53" height="53" border="0" ></a><br>
            <span>''' + pricelist[i] + '''</span>
        </div>
    </div>
 </div>
  <div style="float: left; width: 200px;">
 <div class="box" align="center">
        <div align="center">
            <a href="''' + addlist[i+j] + '''?at=10l5NR"  target="_blank">''' + titlist[i+j] + '''</a><br>
            <a href="''' + addlist[i+j] + '''?at=10l5NR" target="_blank"><img src="''' + imglist[i+j] + '''" alt="" width="53" height="53" border="0" ></a><br>
            <span>''' + pricelist[i+j] + '''</span>
        </div>
    </div>
 </div>
 <div style="float: left; width: 200px;">
 <div class="box" align="center">
        <div align="center">
            <a href="''' + addlist[i+k] + '''?at=10l5NR"  target="_blank">''' + titlist[i+k] + '''</a><br>
            <a href="''' + addlist[i+k] + '''?at=10l5NR" target="_blank"><img src="''' + imglist[i+k] + '''" alt="" width="53" height="53" border="0" ></a><br>
            <span>''' + pricelist[i+k] + '''</span>
        </div>
    </div>
</div>
<br style="clear: left;" />
</div>


<br>
''')
f.write('''</body>''')

f.close()
1
On what line does the error occur? - Dolda2000
Traceback (most recent call last): File "games.py", line 114, in <module> ''') UnicodeEncodeError: 'ascii' codec can't encode character u'\u2661' in position 1409: ordinal not in range(128). But there is no line 114 of the script itself, it's barfing on the xml it's receiving. - user1764417
If there is no line 114 in the script, what is this "games.py" file, and how is it called by the script? - Dolda2000
By the way, if I run this script of yours, it works perfectly fine. Are you sure what you're doing wrong isn't somehow outside it? - Dolda2000
Because I'm a moron and referenced the edited line and not the original. Sorry! f.write('''</body>''') - user1764417

1 Answers

1
votes

The basic problem is that you're concatenating the Unicode strings with ordinary byte-strings without converting them using a proper encoding; in these cases, ASCII is used by default (which, clearly, can't handle extended characters).

The line in your script that does this is too long to quote, but another practical example which displays the same problem could look like this:

parameter = u"foo \u2661"
sys.stdout.write(parameter + " bar\n")

You will need to instead encode the Unicode strings with an explicitly specified encoding, e.g. like this:

parameter = u"foo \u2661"
sys.stdout.write(parameter.encode("utf8") + " bar\n")

In your case, you can do this in your loops so as to not have to specify it on every concatenation:

for i in range(1,len(title)):
    titlist.append(title[i].firstChild.nodeValue.encode("utf8"))

--

Also, while we're at it, you can improve your code by not iterating through the elements using an integer index. For instance, instead of this:

title = dom.getElementsByTagName('title')
for i in range(1,len(title)):
    titlist.append(title[i].firstChild.nodeValue.encode("utf8"))

... you can do this instead:

for title in dom.getElementsByTagName('title')
    titlist.append(title.firstChild.nodeValue.encode("utf8"))