22
votes

My question is why does python use both reference counting and mark-and-sweep for gc? Why not only mark-and-sweep?

My initial guess is that using reference counting can easily remove non-cyclic referenced objects, this may somewhat speed up mark-and-sweep and gain memory immediately. Don't know if my guess is right?

Any thoughts?

Thanks a lot.

3

3 Answers

20
votes

Python (the language) doesn't say which form of garbage collection it uses. The main implementation (often known as CPython) acts as you describe. Other versions such as Jython or IronPython use a purely garbage collected system.

Yes, there is a benefit of earlier collection with reference counting, but the main reason CPython uses it is historical. Originally there was no garbage collection for cyclic objects so cycles led to memory leaks. The C APIs and data structures are based heavily around the principle of reference counting. When real garbage collection was added it wasn't an option to break the existing binary APIs and all the libraries that depended on them so the reference counting had to remain.

16
votes

Reference counting deallocates objects sooner than garbage collection.

But as reference counting can't handle reference cycles between unreachable objects, Python uses a garbage collector (really just a cycle collector) to collect those cycles when they exist.

2
votes

My initial guess is that using reference counting can easily remove non-cyclic referenced objects, this may somewhat speed up mark-and-sweep and gain memory immediately. Don't know if my guess is right?

Yes. As soon as the refcount goes to zero and object can be removed. This won't happen in a cyclic referenced object. AFAIK, mark and sweep is a costly operation and the simplest way to implement it requires you to "stop the world" while objects are marked. When all of the objects are traversed, andy object not marked (as reachable) is released.