0
votes

I am currently learning Python and came across the following error:

Traceback (most recent call last):
File "file.py", line 22, in module

for word in file.read():

File "C:\Users\user\AppData\Local\Continuum\Anaconda3\lib\encodings\cp1252.py", line 23, in decode

return codecs.charmap_decode(input,self.errors,decoding_table)[0]

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 6552: character maps to undefined

This is my code:

file=open('xyz.txt')

dict={}

ignorelist=set( line.strip() for line in open('ignorelist'))

for word in file.read():
    word = word.replace(".","")
    word = word.replace(",","")

    if word not in ignorelist:
        if word not in dict:
            dict[word] = 1
        else:
            dict[word] += 1

d=collections.Counter(dict)

for word, count in d.most_common(10):
    print(word, ": ", count)

does anyone know why this happens?

thanks in advance!

1
looks like it tries to decode the chars as cp1252 & it fails to do thatDavid Bern
Perhaps this will help you debugging the problem i18nqa.com/debug/bug-double-conversion.htmlDavid Bern

1 Answers

2
votes

Could you try with this change, by specifying the encoding:

file=open('xyz.txt', encoding='utf8')

(The ignorelist file may need it too )