1
votes

I am looking for help accessing key values with special characters (accented letters and such) in Python 3.x dictionary formats. So here's what I am trying to accomplish:

I have a .xml file that I parse into Python with ElementTree:

...
    tree = ElementTree.parse(fileNamePath)
...

The source is a program called Cockatrice. It is their card.xml file.

I have a .json text file that I load with json.load(open(fileName)).

The source is: https://mtgjson.com/json/AllCards-x.json.zip

Both databases contain over 16,000 entries and are too cumbersome for me to easily print out at once on the oldPC I must use. Plus, with some of the special characters, CMD isn't always willing to print them.

Anyway...

I am using the names found in the XML file as the variable to use when searching the JSON (converted to DICT) keys.

cardName=root[1][loop_control01].find('name').text

I then pull the info I want from the JSON/DICT with that name and mostly it works well with the exception of when it gets to a name with special characters. An example that keeps popping up is Bösium Strip.

The Error Message is a Key error:

KeyError: 'Bösium Strip'

I have confirmed that the key exists in the JSON by looking through it manually through notepad. In the XML file the text is spelled as:

...
  <card>
     <name>Bösium Strip</name>...

and in the JSON file it is spelled as:

...
    "Bösium Strip":{
        "layout":"normal",...

While I do know of the problems printing these characters out in CMD, that does not seem to be the issue here as I am not printing them to screen. I just need to be able to reference the key in searching the JSON/DICT.

I have tried several of the answers found here on StackOverflow to no avail. I either need to search the JSON/DICT using the same format/encoding of characters or I need to iterate through the JSON/DICT and reformat all the keys to a more easily searchable format/encoding.

Any help accomplishing either would make me happy. Thanks to anyone who takes the time to give me a nice solution-present for my birthday today <3

2
Solid question, I have to say! And congratulations :) - geisterfurz007
@m0m0e Happy Birthday ! - Edwin van Mierlo
Thanks so much! <3 You guys gave me an awesome present of an updated cockatrice cards.xml that has the card legalities per format. Will share with all when it's public-ready. Thank you all so much for the help. - m0m0e

2 Answers

2
votes

It could be a normalization problem. If you ascii() the keys in your dictionary you can see if there is a difference between your key and the dictionary's.

For example:

>>> s = 'Bösium Strip'
>>> ascii(s)
"'B\\xf6sium Strip'"
>>> import unicodedata as ud
>>> t = ud.normalize('NFD',s)
>>> t
'Bösium Strip'
>>> s
'Bösium Strip'
>>> s==t
False
>>> ascii(s)
"'B\\xf6sium Strip'"
>>> ascii(t)
"'Bo\\u0308sium Strip'"

s uses a single Unicode character, while t uses a combining character with the o. They display the same but don't compare the same, but you can use unicodedata.normalize with either NFD or NFC to convert to the decomposed or combined form.

1
votes

Not sure where your problem is, the following works on python 3.4.2

# -*- coding: utf-8 -*-
import json

jsonstring = """
[{
    "Bösium Strip": {
        "layout": "normal"
    },
    "test": {
        "hello1": "hello2"
    }
}]
"""
#data is a now a single element list, element containing dictionary
data = json.loads(jsonstring)

#both the following works
print(data[0].get('test', ''))
print(data[0].get('Bösium Strip', ''))

#the following works as well
keys = list(data[0].keys())
for key in keys:
    print(key)
    print(data[0].get(key))

However this example is using json.loads() not json.load() the later will take a file object, you may have to experiment with the encoding of that object, example:

json.load(open(filename, 'r', encoding = 'utf-8'))

I just checked some of my own code, and I do it this way:

with open(filename, 'r', encoding = e) as openfileobject:
    data = openfileobject.read()
    data = _check_json_data(data)  #this is my own check
    data = json.loads(data)

Where I can set the e to the correct encoding of the file, and I do a check on the data to validate it is valid json to begin with (but that is for another question)

good luck!