18
votes

I came across this behavior that surprised me in Python 2.6 and 3.2:

>>> xs = dict.fromkeys(range(2), [])
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: [1]}

However, dict comprehensions in 3.2 show a more polite demeanor:

>>> xs = {i:[] for i in range(2)}
>>> xs
{0: [], 1: []}
>>> xs[0].append(1)
>>> xs
{0: [1], 1: []}
>>> 

Why does fromkeys behave like that?

3
the difference is the same as in [[]]*2 and [[] for _ in range(2)]. - jfs
@J.F.Sebastian I am used to the meaning of [[]]*2 and other gotchas alike. But fromkeys got me by surprise. Maybe is just a question of familiarity...I practically never use the fromkeys method... - joaquin

3 Answers

19
votes

Your Python 2.6 example is equivalent to the following, which may help to clarify:

>>> a = []
>>> xs = dict.fromkeys(range(2), a)

Each entry in the resulting dictionary will have a reference to the same object. The effects of mutating that object will be visible through every dict entry, as you've seen, because it's one object.

>>> xs[0] is a and xs[1] is a
True

Use a dict comprehension, or if you're stuck on Python 2.6 or older and you don't have dictionary comprehensions, you can get the dict comprehension behavior by using dict() with a generator expression:

xs = dict((i, []) for i in range(2))
5
votes

In the first version, you use the same empty list object as the value for both keys, so if you change one, you change the other, too.

Look at this:

>>> empty = []
>>> d = dict.fromkeys(range(2), empty)
>>> d
{0: [], 1: []}
>>> empty.append(1) # same as d[0].append(1) because d[0] references empty!
>>> d
{0: [1], 1: [1]}

In the second version, a new empty list object is created in every iteration of the dict comprehension, so both are independent from each other.

As to "why" fromkeys() works like that - well, it would be surprising if it didn't work like that. fromkeys(iterable, value) constructs a new dict with keys from iterable that all have the value value. If that value is a mutable object, and you change that object, what else could you reasonably expect to happen?

1
votes

To answer the actual question being asked: fromkeys behaves like that because there is no other reasonable choice. It is not reasonable (or even possible) to have fromkeys decide whether or not your argument is mutable and make new copies every time. In some cases it doesn't make sense, and in others it's just impossible.

The second argument you pass in is therefore just a reference, and is copied as such. An assignment of [] in Python means "a single reference to a new list", not "make a new list every time I access this variable". The alternative would be to pass in a function that generates new instances, which is the functionality that dict comprehensions supply for you.

Here are some options for creating multiple actual copies of a mutable container:

  1. As you mention in the question, dict comprehensions allow you to execute an arbitrary statement for each element:

    d = {k: [] for k in range(2)}
    

    The important thing here is that this is equivalent to putting the assignment k = [] in a for loop. Each iteration creates a new list and assigns it to a value.

  2. Use the form of the dict constructor suggested by @Andrew Clark:

    d = dict((k, []) for k in range(2))
    

    This creates a generator which again makes the assignment of a new list to each key-value pair when it is executed.

  3. Use a collections.defaultdict instead of a regular dict:

    d = collections.defaultdict(list)
    

    This option is a little different from the others. Instead of creating the new list references up front, defaultdict will call list every time you access a key that's not already there. You can there fore add the keys as lazily as you want, which can be very convenient sometimes:

    for k in range(2):
        d[k].append(42)
    

    Since you've set up the factory for new elements, this will actually behave exactly as you expected fromkeys to behave in the original question.

  4. Use dict.setdefault when you access potentially new keys. This does something similar to what defaultdict does, but it has the advantage of being more controlled, in the sense that only the access you want to create new keys actually creates them:

    d = {}
    for k in range(2):
        d.setdefault(k, []).append(42)
    

    The disadvantage is that a new empty list object gets created every time you call the function, even if it never gets assigned to a value. This is not a huge problem, but it could add up if you call it frequently and/or your container is not as simple as list.