1
votes

I'm trying to write a script that takes a json file pizza-train.json and extracts the request_text field from each dictionary in the list. But im getting an error when running the below code:

Code:

import json

json1_file = open("pizza-train.json", 'r')
json1_str = json1_file.read()

json1_data = json.loads(json1_str)

print(json1_data)

Error:

File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0]

UnicodeEncodeError: 'charmap' codec can't encode characters in position 58765-58767: character maps to

I've tried different solutions, such as: encoding="UTF-8" .encode('utf-8')

Can anyone explain my why it wont print the json1_data ?

1
Provide full stacktrace, please. Do you get this error on print(json1_data) line? If yes, then you can't print non-Windows1252 chars in your console since this is your console charset.user996142
I got the error at that line. How would one change hes console charset?Anonymous
show answer: use chcp. Long answer is belowuser996142

1 Answers

0
votes

Your data has some characters (at positions 58765-58767) that can't be represented with your charset (Windows1252). You should switch your console to charset that supports it (chcp command in windows).

Here is example:

I have file:

# coding=utf-8
print(u"русский текст") # This is russian text (cyrilic chars)

File is UTF8 itself, so Python knows that letter "й" is cyrillic letter.

But I have code page CP1252 that has no such letter (since it has only latin-based chars).

>chcp
Active code page: 1252

>python.exe foo.py
Traceback (most recent call last):
  File "foo.py", line 2, in <module>
    print(u"руÑÑкий текÑÑ‚")
  File "c:\Python27\lib\encodings\cp1252.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_table)
UnicodeEncodeError: 'charmap' codec can't encode characters in position     0-6: character maps to <undefined>

I now change my codepage to one, that has letter "й" and other:

>chcp 1251
Active code page: 1251

>c:\Python27\python.exe foo.py
русский текст

I can use 866 (which is DOS cyr codepage as well)