39
votes

I'm trying to understand how Python's garbage collector detects circular references. When I look at the documentation, all I see is a statement that circular references are detected, except when the objects involved have a __del__ method.

If this happens, my understanding (possibly faulty) is that the gc module acts as a failsafe by (I assume) walking through all the allocated memory and freeing any unreachable blocks.

How does Python detect & free circular memory references before making use of the gc module?

3
"..but is not guaranteed to collect garbage containing circular references." Say the docs.Joel Cornett
Can you link the page of the documentation that you're referring to?Joel Cornett
I cannot find a license on the linked page, so I'm not sure it would be ok to copy a longer passage from it to SO. And without copying the content, the link wouldn't qualify as an answer.Sven Marnach
I won't write a summary, so go ahead. BTW, I found the link in the Python source code. There are further links to relevant mailing list threads. The basic concept of the garbage collector seems to be unchanged.Sven Marnach

3 Answers

33
votes

How does Python detect & free circular memory references before making use of the gc module?

It doesn't. The gc exists only to detect and free circular references. Non-circular references are handled through refcounting.

Now, to see how gc determines the set of objects referenced by any given object, take a look at the gc_get_references function in Modules/gcmodule.c. The relevant bit is:

// Where `obj` is the object who's references we want to find
traverseproc traverse;
if (! PyObject_IS_GC(obj))
    continue;
traverse = Py_TYPE(obj)->tp_traverse;
if (! traverse)
    continue;
if (traverse(obj, (visitproc)referentsvisit, result)) {
    Py_DECREF(result);
    return NULL;
}

The major function here is tp_traverse. Each C-level type defines a tp_traverse function (or in the case of objects which don't hold any references, like str, sets it to NULL). One example of tp_traverse is list_traverse, the traversal function for list:

static int
list_traverse(PyListObject *o, visitproc visit, void *arg)
{
    Py_ssize_t i;

    for (i = Py_SIZE(o); --i >= 0; )
        Py_VISIT(o->ob_item[i]);
    return 0;
}

I see is a statement that circular references are detected, except when the objects involved have a __del__() method.

You are correct — Python's cycle detector can detect and collect cycles unless they contain objects with a __del__ method, as there is no way for the interpreter to safely delete these objects (to get an intuition on why this is, imagine you've got two objects with __del__ methods that reference each other. In which order should they be freed?).

When objects with a __del__ method are involved in a cycle, the garbage collector will stick them in a separate list (accessible through gc.garbage) so that the programmer can manually "deal with" them.

7
votes

I think I found the answer I'm looking for in some links provided by @SvenMarnich in comments to the original question:

Container objects are Python objects that can hold references to other Python objects. Lists, Classes, Tuples etc are container objects; Integers, Strings etc. are not. So, only container objects are at risk for being in a circular reference.

Each Python object has a field - *gc_ref*, which is (I believe) set to NULL for non-container objects. For container objects it is set equal to the number of non container objects that reference it

Any container object with a *gc_ref* count greater than 1 (? I would've thought 0, but OK for now ?) has references that are not container objects. So they are reachable and are removed from consideration of being unreachable memory islands.

Any container object reachable by an object known to be reachable (i.e. those we just recognized as having a *gc_ref* count greater than 1) also does not need to be freed.

The remaining container objects are not reachable (except by each other) and should be freed.

http://www.arctrix.com/nas/python/gc/ is a link providing a fuller explanation http://hg.python.org/cpython/file/2059910e7d76/Modules/gcmodule.c is a link to the source code, which has comments further explaining the thoughts behind the circular reference detection

6
votes

How does Python detect & free circular memory references before making use of the gc module?

Python's garbage collector (not actually the gc module, which is just the Python interface to the garbage collector) does this. So, Python doesn't detect and free circular memory references before making use of the garbage collector.

Python ordinarily frees most objects as soon as their reference count reaches zero. (I say "most" because it never frees, for example, small integers or interned strings.) In the case of circular references, this never happens, so the garbage collector periodically walks memory and frees circularly-referenced objects.

This is all CPython-specific, of course. Other Python implementations have different memory management (Jython = Java VM, IronPython = Microsoft .NET CLR).