20
votes

I found this article here:

Quantifying the Performance of Garbage Collection vs. Explicit Memory Management

http://www.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf

In the conclusion section, it reads:

Comparing runtime, space consumption, and virtual memory footprints over a range of benchmarks, we show that the runtime performance of the best-performing garbage collector is competitive with explicit memory management when given enough memory. In particular, when garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management. However, garbage collection’s performance degrades substantially when it must use smaller heaps. With three times as much memory, it runs 17% slower on average, and with twice as much memory, it runs 70% slower. Garbage collection also is more susceptible to paging when physical memory is scarce. In such conditions, all of the garbage collectors we examine here suffer order-of-magnitude performance penalties relative to explicit memory management.

So, if my understanding is correct: if I have an app written in native C++ requiring 100 MB of memory, to achieve the same performance with a "managed" (i.e. garbage collector based) language (e.g. Java, C#), the app should require 5*100 MB = 500 MB? (And with 2*100 MB = 200 MB, the managed app would run 70% slower than the native app?)

Do you know if current (i.e. latest Java VM's and .NET 4.0's) garbage collectors suffer the same problems described in the aforementioned article? Has the performance of modern garbage collectors improved?

Thanks.

6
The garbage collector uses a "best guess" algorithm to determine when to dispose of memory. The algorithms it uses are not necessarily tuned to what is happening inside your specific program. When you are managing the memory yourself, you get to be more selective and intelligent about how the memory gets collected, but that requires more skill and effort, and you run the risk of memory leaks if it's not done properly.Robert Harvey
@Robert Harvey: Actually I think that in C++, thanks to destructors, RAII and smart pointers, it is hard to leak something (being them memory resources or resources of other kinds). In fact, while in Java you could miss releasing a resource in a try...finally block, in C++ the destructors of variables allocated on the stack (or of smart pointers) are automatically called. (Of course, this assumes that classes are properly designed, and each class releases its own resources in its destructor.)EmbeddedProg
Thus my statement, "...if it's not done properly."Robert Harvey
I haven't read the article, but isn't the 70% considering memory management overhead only? So the application will not run 70% slower, it'll just spend 70% more time managing memory (which is likely to be a fraction of the total execution time to begin with)jalf
It can be a pretty big fraction; most of the profiles I've run on large realtime apps showed about 5-15% of total execution time inside memory management. And those were cases where people wrote them worrying about performance from the start.Crashworks

6 Answers

10
votes

if I have an app written in native C++ requiring 100 MB of memory, to achieve the same performance with a "managed" (i.e. garbage collector based) language (e.g. Java, C#), the app should require 5*100 MB = 500 MB? (And with 2*100 MB = 200 MB, the managed app would run 70% slower than the native app?)

Only if the app is bottlenecked on allocating and deallocating memory. Note that the paper talks exclusively about the performance of the garbage collector itself.

8
votes

You seem to be asking two things:

  • have GC's improved since that research was performed, and
  • can I use the conclusions of the paper as a formula to predict required memory.

The answer to the first is that there have been no major breakthroughs in GC algorithms that would invalidate the general conclusions:

  • GC'ed memory management still requires significantly more virtual memory.
  • If you try to constrain the heap size the GC performance drops significantly.
  • If real memory is restricted, the GC'ed memory management approach results in substantially worse performance due to paging overheads.

However, the conclusions cannot really be used as a formula:

  • The original study was done with JikesRVM rather than a Sun JVM.
  • The Sun JVM's garbage collectors have improved in the ~5 years since the study.
  • The study does not seem to take into account that Java data structures take more space than equivalent C++ data structures for reasons that are not GC related.

On the last point, I have seen a presentation by someone that talks about Java memory overheads. For instance, it found that the minimum representation size of a Java String is something like 48 bytes. (A String consists of two primitive objects; one an Object with 4 word-sized fields and the other an array with a minimum of 1 word of content. Each primitive object also has 3 or 4 words of overhead.) Java collection data structures similarly use far more memory than people realize.

These overheads are not GC-related per se. Rather they are direct and indirect consequences of design decisions in the Java language, JVM and class libraries. For example:

  • Each Java primitive object header1 reserves one word for the object's "identity hashcode" value, and one or more words for representing the object lock.
  • The representation of a String has to use a separate "array of characters" because of JVM limitations. Two of the three other fields are an attempt to make the substring operation less memory intensive.
  • The Java collection types use a lot of memory because collection elements cannot be directly chained. So for example, the overheads of a (hypothetical) singly linked list collection class in Java would be 6 words per list element. By contrast an optimal C/C++ linked list (i.e. with each element having a "next" pointer) has an overhead of one word per list element.

1 - In fact, the overheads are less than this on average. The JVM only "inflates" a lock following use & contention, and similar tricks are used for the identity hashcode. The fixed overhead is only a few bits. However, these bits add up to a measurably larger object header ... which is the real point here.

4
votes

Michael Borgwardt is kind of right about if the application is bottlenecked on allocating memory. This is according to Amdahl's law.

However, I have used C++, Java, and VB .NET. In C++ there are powerful techniques available that allocate memory on the stack instead of the heap. Stack allocation is easily a hundreds of times faster than heap allocation. I would say that use of these techniques could remove maybe one allocation in eight, and use of writable strings one allocation in four.

It's no joke when people claim highly optimized C++ code can trounce the best possible Java code. It's the flat out truth.

Microsoft claims the overhead in using any of the .NET family of languages over C++ is about two to one. I believe that number is just about right for most things.

HOWEVER, managed environments carry a particular benefit in that when dealing with inferior programmers you don't have to worry about one module trashing another module's memory and the resulting crash being blamed on the wrong developer and the bug difficult to find.

3
votes

At least as I read it, your real question is whether there have been significant developments in garbage collection or manual memory management since that paper was published that would invalidate its results. The answer to that is somewhat mixed. On one hand, the vendors who provide garbage collectors do tune them so their performance tends to improve over time. On the other hand, there hasn't been anything like a major breakthroughs such as major new garbage collection algorithms.

Manual heap managers generally improve over time as well. I doubt most are tuned with quite the regularity of garbage collectors, but in the course of 5 years, probably most have had at least a bit of work done.

In short, both have undoubtedly improved at least a little, but in neither case have there been major new algorithms that change the fundamental landscape. It's doubtful that current implementations will give a difference of exactly 17% as quoted in the article, but there's a pretty good chance that if you repeated the tests today, you'd still get a difference somewhere around 15-20% or so. The differences between then and now are probably smaller than the differences between some of the different algorithms they tested at that time.

3
votes

I am not sure how relivent your question still is today. A performance critical application shouldn't spend a sigificant portion of its time doing object creation (as the micro-benchmark is very likely to do) and the performance on modern systems is more likely to be determined by how well the application fits into the CPUs cache, rather than how much main memory it uses.

BTW: There are lots of ticks you can do in C++ which support this which are not available in Java.

If you are worried about the cost of GC or object creation, you can take steps to minimise how many objects you create. This is generally a good idea where performance is critical in any language.

The cost of main memory isn't as much of an issue as it used to me. A machine with 48 GB is relatively cheap these days. An 8 core server with 48 GB of main memory can be leased for £9/day. Try hiring a developer for £9/d. ;) However, what is still relatively expensive is CPU cache memory. It is fairly hard to find a system with more than 16 MB of CPU cache. c.f. 48,000 MB of main memory. A system performs much better when an application is using its CPU cache and this is the amount of memory to consider if performance is critical.

-1
votes

First note that its now 2019 and a lot of things has improved. As long as you dont trigger GC, allocation would be like as simple as incrementing a pointer. In C++ its much more if you dont implement your own mechanism to allocate in chunks. And if you use smart shared pointers each change to refercence count will required locked increment (xaddl instruction) is slow itself and requires processors communicate to invalidate and resynch their cacheline. What is more, with GC you get more locality with at least three ways. First when it allocates a new segment, it zero's memory and warms cachelines. Second it compacts heap and cause data to stay closer togeter and lastly all threads use its own heap. In conclusion, although its hard to test and compare with every scenario and GC implementation ive read somewhere on SO that its proven GC performs better than manual memory management.