0
votes

I download different Companies name from different websites from my localhost sometimes i face this problem and that is interrupt the download procedure.My script is working fine for others country but when i download Czech Republic this type of error is occurred.

Total companies processed so far:0 Traceback (most recent call last): File "process1.py", line 261, in print "Company Name: "+hit.text File "C:\Python27\lib\encodings\cp437.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode character u'\xfd' in position 3 3: character maps to

my code is here :

if companyAlreadyKnown == 0:
                            for hit in soup2.findAll("h1"):
                                print "Company Name: "+hit.text
                                pCompanyName = hit.text
                                flog.write("\nCompany Name: "+str(pCompanyName))
                                companyObj.setCompanyName(pCompanyName)

I don't know why it is happened.Any suggestion in this problem?

1

1 Answers

1
votes

Czech language contains a lot of non ASCII characters. u'\xfd' is a unicode representation of ý. You need to decode UTF-8. An even better solution is to detect what encoding the website you are scraping uses and decode to that one.

if companyAlreadyKnown == 0:
    for hit in soup2.findAll("h1"):
        company_name = hit.text.decode('utf-8')

        print "Company Name: " + company_name

        flog.write("\nCompany Name: " + pCompanyName)
        companyObj.setCompanyName(company_name)