Parallel Compacting Collector algorithm

Question

I have two questions. And one of them will be ragarding the topic:)

1) I have faced with a problem of not being able to find the full information about how different garbage collector work in HotSpot. But I am not talking about general descriptions of garbage collector's work (we have a lot of this information on the internet), I am talking about concrete algorithms. I have found this whitepaper (Memory managment in the Java HotSpot Virtual Machine) http://www.oracle.com/technetwork/java/javase/tech/memorymanagement-whitepaper-1-150020.pdf . But it has only general ideas. It has a good description (probably it is not so good - see my second question) of the Parallel compacting algorithm (I mean parallel mark-sweep-compact) but it doesn't explain other garbage collector's algorithms. However this whitepaper is the best information I was able to find on the Internet. What I would like to know is where to get a full description/information about how different garbage collectors (for young gen I mean: ParNew, DefNew, PSYoungGen; for old gen: PSOLdGen, ParOldGen, Concurrent-Mark-Sweep) work. Can't believe that this information is not available to the users.

2) The question regarding the Parallel Compacting Collector algorithm (ParOldGen or Parallel Mark-Sweep-Compact). The whitepaper (see the first question) has a description of it's work. Let me paste a quote from the whitepaper (please, spend a minute to take a look on it): enter image description here

The things I cannot understand are listed below:

Regarding summary phase:

Due to compactions from previous collections, it is typical that some portion of the left side of each generation will be dense, containing mostly live objects. The amount of space that could be recovered from such dense regions is not worth the cost of compacting them.

well, does it mean that when we have a region which consist of 98-99% of live objects and 2-1% of dead objects (in other words a very small percentage of the dead objects) than compacting of this region is not worth the space that could be recovered from such a region. However this tiny free spaces (holes) will be eventually filled up and there will be no holes after garbage collections is finished.

So the first thing the summary phase does is examine the density of the regions, starting with the leftmost one, until it reaches a point where the space that could be recovered from a region and those to the right of it is worth the cost of compacting those regions.

well, if we have a big percentage of the dead objects than this region is worth compacting, right?

The regions to the left of that point are referred to as the dense prefix, and no objects are moved in those regions.

"and no objects are moved in those regions", but there might be some little free spaces in those regions, am I right? Can't uderstand the point

The regions to the right of that point will be compacted, eliminating all dead space.

please clarify how they will be compacted. Every region will be compacted separately? I guess no. So maybe some kind of shifting will be here?

The summary phase calculates and stores the new location of the first byte of live data for each compacted region.

to understand it I need to understand the previous question I suppose.

Regarding compaction phase:

In the compaction phase, the garbage collection threads use the
summary data to identify regions that need to be filled, and the
threads can independently copy data into the regions. This produces a heap that is densely packed on one end, with a single large empty
block at the other end.

I am totally confused. So no compaction happened on the "summary phase"? Was the previous phase's purpose only to find all free spaces?

Please help me to get a clear picture.

I don't know how they work and I haven't done this myself, but it may be fruitful for you to read the garbage collector code in OpenJDK. — tmyklebu
I'm guessing that even the folks who maintain the stuff don't know how it works, and it's too complicated (and volatile) to write a book that precisely describes every nuance of every algorithm. I suspect you're looking for more detail than can ever be available. — Hot Licks
"However this tiny free spaces (holes) will be eventually filled up and there will be no holes after garbage collections is finished." Not the way I understand it. Hotspot is not a conventional heap. "Holes" are not filled in. Instead one waits until the entire region is compacted. "Compacting", for most Hotspot regions, implies copying all objects from the old space to a new space, skipping over the "holes". Hotspot is, for the most part, a "stop and copy" collector. — Hot Licks
(There are two fundamentally different forms of GC: "Stop and copy" and "free list". The "free list" version is sort of like a conventional heap manager where freed allocations are added to a list for reallocation. "Stop and copy", on the other hand, physically copies objects from Area A to Area B, skipping over "holes", then starts allocating new objects in the now-empty Area A. HotSpot, is, for the most part, "stop and copy".) — Hot Licks

Domi Domi · Accepted Answer · 2013-12-06T17:49:15

This is just a general description of an algorithm. Such descriptions can be of different detail. In this case, it gives you most details, but still leaves some choices for the implementer.

Regarding your questions:

So no compaction happened on the "summary phase"? Was the previous phase's purpose only to find all free spaces? - Yes, that is correct. The summary phase gathers indexing data and basically determines everything necessary, so that the compaction phase can then perform the copying. They don't tell how they implement compaction, but the default way is simply placing every live object right next to the previous object. Basically, all empty space is removed and after the compaction step has completed, you have one contiguous chunk of memory, containing all live objects. I see your confusion with the fourth part, but note that it is written in future tense: 'will be compacted' - So not during summary, but later.
Does it mean [...] compacting of this region in not worth the space that could be recovered from such a region? Yes, that is right. You essentially lose some space, but it is very common to trade of memory for execution speed. The exact density threshold is up to the implementation, but I would ballpark the used-to-total-memory ratio threshold at around 70-90%.

If you want to know all the dirty details, have a look at an open source VM implementations, as suggested in the comments.

Parallel Compacting Collector algorithm

2 Answers