25
votes

In a lay-man terminology how does the garbage collection mechanism work?

How an object is identified to be available for garbage collection?

Also, what do Reference Counting, Mark and Sweep, Copying, Train mean in GC algorithms?

5
Nopes... its not. Probably it appears just coz I put it that way. Any waysS M Kamran
I'd recommend reading the fairly good, 34-page illustrated paper, Uniprocessor Garbage Collection Techniques, by Paul R. Wilson (1992), that explains the concepts behind basic garbage collection techniques (reference counting, mark-and-sweep, mark-compact, incremental, generational).stakx - no longer contributing

5 Answers

38
votes

When you use a language with garbage collection you wont get access to the memory directly. Rather you are given access to some abstraction on top of that data. One of the things that is properly abstracted away is the the actual location in memory of the data block, as well as pointers to other datablocks. When the garbage collector runs (this happens occasionally) it will check if you still hold a reference to each of the memory blocks it has allocated for you. If you don't it will free that memory.

The main difference between the different types of garbage collectors is their efficiency as well as any limitations on what kind of allocation schemes they can handle.

The simplest is properly reference counting. When ever you create a reference to an object an internal counter on that object is incremented, when you chance the reference or it is no longer in scope, the counter on the (former) target object is decremented. When this counter reaches zero, the object is no longer referred at all and can be freed.

The problem with reference counting garbage collectors is that they cannot deal with circular data. If object A has a reference to object B and that in turn has some (direct or indirect) reference to object A, they can never be freed, even if none of the objects in the chain are refereed outside the chain (and therefore aren't accessible to the program at all).

The Mark and sweep algorithm on the other hand can handle this. The mark and sweep algorithm works by periodically stopping the execution of the program, mark each item the program has allocated as unreachable. The program then runs through all the variables the program has and marks what they point to as reachable. If either of these allocations contain references to other data in the program, that data is then likewise marked as reachable, etc.

This is the mark part of the algorithm. At this point everything the program can access, no matter how indirectly, is marked as reachable and everything the program can't reach is marked as unreachable. The garbage collector can now safely reclaim the memory associated with the objects marked as unreachable.

The problem with the mark and sweep algorithm is that it isn't that efficient -- the entire program has to be stopped to run it, and a lot of the object references aren't going to change.

To improve on this, the mark and sweep algorithm can be extended with so called "generational garbage collection". In this mode objects that have been in the system for some number of garbage collections are promoted to the old generation, which is not checked that often.

This improves efficiency because objects tend to die young (think of a string being changed inside a loop, resulting in perhaps a lifetime of a few hundred cycles) or live very long (the objects used to represent the main window of an application, or the database connection of a servlet).

Much more detailed information can be found on wikipedia.

Added based on comments:

With the mark and sweep algorithm (as well as any other garbage collection algorithm except reference counting) the garbage collection do not run in the context of your program, since it has to be able to access stuff that your program is not capable of accessing directly. Therefore it is not correct to say that the garbage collector runs on the stack.

4
votes
  • Reference counting - Each object has a count which is incremented when someone takes a reference to the object, and decremented when someone releases the reference. When the reference count goes to zero, the object is deleted. COM uses this approach.
  • Mark and sweep - Each object has a flag if it is in use. Starting at the root of the object graph (global variables, locals on stacks, etc.) each referenced object gets its flag set, and so on down the chain. At the end, all objects that are not referenced in the graph are deleted.

The garbage collector for the CLR is described in this slidedeck. "Roots" on slide 15 are the sources for the objects that first go into the graph. Their member fields and so on are used to find the other objects in the graph.

Wikipedia describes several of these approaches in much more and better detail.

4
votes

Garbage collection is simply knowing if there is any future need for variables in your program, and if not, collect and delete them.

Emphasis is on the word Garbage, something that is completely used out in your house is thrown in the trash and the garbage man handles it for you by coming to pick it up and take it away to give you more room in your house trash can.

Reference Counting, Mark and Sweep, Copying, Train etc. are discussed in good detail at GC FAQ

0
votes

The general way it is done is that the number of references to an object are kept track of in the background, and when that number goes to zero, the object is SUBJECT TO garbage collection, however the GC will not fire up until it is explicitly needed because it is an expensive operation. What happens when it starts is that the GC goes through the managed area of memory and finds every object that has no references left. The gc deletes those objects by first calling their destructors, allowing them to clean up after themselves, then frees the memory. Commonly the GC will then compact the managed memory area by moving every surviving object to one area of memory, allowing more allocations to take place.

Like i said this is one method that i know of, and there is a lot of research being done in this area.

0
votes

Garbage collection is a big topic, and there are a lot of ways to implement it.

But for the most common in a nutshell, the garbage collector keeps a record of all references to anything created via the new operator, even if that operator's use was hidden from you (for example, in a Type.Create() method). Each time you add a new reference to the object, the root of that reference is determined and added to the list, if needed. A reference is removed whenever it goes out of scope.

When there are no more references to an object, it can (not "will") be collected. To improve performance and make sure necessary cleanup is done correctly, collections are batched for several objects at once and happen over multiple generations.