70
votes

My python (ver 2.7) script is running well to get some company name from local html files but when it comes to some specific country name, it gives this error "UnicodeEncodeError: 'ascii' codec can't encode character"

Specially getting error when this company name comes

Company Name: Kühlfix Kälteanlagen Ing.Gerhard Doczekal & Co. KG

The link cannot be processed

Traceback (most recent call last): 
  File "C:\Python27\Process2.py", line 261, in <module>
    flog.write("\nCompany Name: "+str(pCompanyName))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 9: ordinal not in range(128)

Error gives in this line of code:

if companyAlreadyKnown == 0:
   for hit in soup2.findAll("h1"):
       print "Company Name: "+hit.text
       pCompanyName = hit.text
       flog.write("\nCompany Name: "+str(pCompanyName))
       companyObj.setCompanyName(pCompanyName)
2
Anybody coming here should visit stackoverflow.com/questions/3828723/… and stackoverflow.com/questions/28657010/…, doing what is suggested in the accepted is usually if not always a very bad idea. - Padraic Cunningham
whereever you are writing to a file or reading from a file, you have to add encoding. open("filename", "w", encoding=''UTF-8") - Reihan_amn

2 Answers

237
votes

Try setting the system default encoding as utf-8 at the start of the script, so that all strings are encoded using that.

Example -

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

The above should set the default encoding as utf-8 .

39
votes

You really want to do this

flog.write("\nCompany Name: "+ pCompanyName.encode('utf-8'))

This is the "encode late" strategy described in this unicode presentation (slides 32 through 35).