12
votes

I have nested dictionaries:

{'key0': {'attrs': {'entity': 'p', 'hash': '34nj3h43b4n3', 'id': '4130'},
          u'key1': {'attrs': {'entity': 'r',
                              'hash': '34njasd3h43b4n3',
                              'id': '4130-1'},
                    u'key2': {'attrs': {'entity': 'c',
                                        'hash': '34njasd3h43bdsfsd4n3',
                                        'id': '4130-1-1'}}},
          u'key3': {'attrs': {'entity': 'r',
                              'hash': '34njasasasd3h43b4n3',
                              'id': '4130-2'},
                    u'key4': {'attrs': {'entity': 'c',
                                        'hash': '34njawersd3h43bdsfsd4n3',
                                        'id': '4130-2-1'}},
                    u'key5': {'attrs': {'entity': 'c',
                                        'hash': '34njawersd3h43bdsfsd4n3',
                                        'id': '4130-2-2'}}}},
 'someohterthing': 'someothervalue',
 'something': 'somevalue'}

given an id - one of all the ids like 4130 to 4130-2-2.
whats the easiest way to navigate to the correct dictionary?

Like if the given id is 4130-2-1 then it should reach the dictionary with key=key5

non xml approaches please.

Edit(1): The nesting is between 1 to 4 levels, but I know the nesting before I parse.

Edit(2): Fixed the code.

**Edit(3):**Fixed code again for string values of ids. Please excuse for the confusion created. This is final I hope :)

7
for '4130-2-1' you want 'key4', not 'key5' right? 'key5' appears to contain '4130-2-2'.Josh Petitt

7 Answers

15
votes

Your structure is unpleasantly irregular. Here's a version with a Visitor function that traverses the attrs sub-dictionaries.

def walkDict( aDict, visitor, path=() ):
    for  k in aDict:
        if k == 'attrs':
            visitor( path, aDict[k] )
        elif type(aDict[k]) != dict:
            pass
        else:
            walkDict( aDict[k], visitor, path+(k,) )

def printMe( path, element ):
    print path, element

def filterFor( path, element ):
    if element['id'] == '4130-2-2':
        print path, element

You'd use it like this.

walkDict( myDict, filterFor )

This can be turned into a generator instead of a Visitor; it would yield path, aDict[k] instead of invoking the visitor function.

You'd use it in a for loop.

for path, attrDict in walkDictIter( aDict ):
    # process attrDict...
14
votes

If you want to solve the problem in a general way, no matter how many level of nesting you have in your dict, then create a recursive function which will traverse the tree:

def traverse_tree(dictionary, id=None):
    for key, value in dictionary.items():
        if key == 'id':
            if value == id:
                print dictionary
        else:
             traverse_tree(value, id)
    return

>>> traverse_tree({1: {'id': 2}, 2: {'id': 3}}, id=2)
{'id': 2}
9
votes

This kind of problem is often better solved with proper class definitions, not generic dictionaries.

class ProperObject( object ):
    """A proper class definition for each "attr" dictionary."""
    def __init__( self, path, attrDict ):
        self.path= path
        self.__dict__.update( attrDict )
    def __str__( self ):
        return "path %r, entity %r, hash %r, id %r" % (
            self.path, self.entity, self.hash, self.id )

masterDict= {} 
def builder( path, element ):
    masterDict[path]= ProperObject( path, element )

# Use the Visitor to build ProperObjects for each "attr"
walkDict( myDict, builder )

# Now that we have a simple dictionary of Proper Objects, things are simple
for k,v in masterDict.items():
    if v.id == '4130-2-2':
        print v

Also, now that you have Proper Object definitions, you can do the following

# Create an "index" of your ProperObjects
import collections
byId= collections.defaultdict(list)
for k in masterDict:
    byId[masterDict[k].id].append( masterDict[k] )

# Look up a particular item in the index
print map( str, byId['4130-2-2'] )
5
votes

This is an old question but still a top google result, so I'll update:

A friend and myself published a library to solve (very nearly) this exact problem. dpath-python (no relation to the perl dpath module which does similar things).

http://github.com/akesterson/dpath-python

All you would need to do is something like this:

$ easy_install dpath
>>> import dpath.util
>>> results = []
>>> for (path, value) in dpath.util.search(my_dictionary, "*/attrs/entity/4130*", yielded=True):
>>> ... parent = dpath.util.search("/".join(path.split("/")[:-2])
>>> ... results.append(parent)

... that would give you a list of all the dictionary objects that matched your search, i.e., all the objects that had (key = 4130*). The parent bit is a little janky, but it would work.

1
votes

Since recursion is known to be limited in python (see What is the maximum recursion depth in Python, and how to increase it?) I would rather have a loop based answer to this question, so the answer can be adapted to any level of depth in the dictionary. For that, the function

def walkDict( aDict, visitor, path=() ):
    for  k in aDict:
        if k == 'attrs':
            visitor( path, aDict[k] )
        elif type(aDict[k]) != dict:
            pass
        else:
            walkDict( aDict[k], visitor, path+(k,) )

Can be replaced by:

def walkDictLoop(aDict, visitor, path=()):
    toProcess = [(aDict, path)]
    while toProcess:
        dictNode, pathNode = toProcess.pop(0)
        for k in dictNode:
            if k == 'attrs':
                visitor(pathNode, dictNode[k])
            if isinstance(dictNode[k], dict):
                toProcess.append( (dictNode[k], pathNode+(k,)) )
0
votes

Well, if you have to do it only a few times, you can just use nested dict.iteritems() to find what you are looking for.

If you plan to do it several times, performances will quickly becomes an issue. In that case you could :

  • change the way you data is returned to you to something more suitable.

  • if you can't, convert the data once the fly to a dict between id and keys (using iteritems). Then use it.

0
votes

I believe pydash will give you the most efficient way to achieve this.

For example:

data = {'a': {'b': {'c': [0, 0, {'d': [0, {1: 2}]}]}}, 'names': {'first': 'gus', 'second': 'parvez'}}

pydash.get(data, 'a.b.c.2.d.1.[1]')

# output: 2

Detail documentation you can find here: https://pydash.readthedocs.io/en/latest/quickstart.html