39
votes

Ruby 2.0 introduces a copy-on-write friendly garbage collector. My processes don't seem to keep memory shared for more than a few minutes - it seems to move from shared_dirty to private_dirty quite quickly.

Some others have had success getting this to work:

This program can be used to check memory stats on Linux: https://gist.github.com/kenn/5105061

My unicorn configuration: https://gist.github.com/inspire22/f82c77c0a465f1945305

For some reason my unicorn apps, also with preload_app=true, have much less shared memory. Ruby 2.0-p195, rails 3.2, linux 2.6.18 (centos)

[root@thorn script]# ruby memstats.rb 4946
Process:             4946
Command Line:        unicorn_rails worker[4] -c /u/apps/newap/current/lib/unicorn.rb -E production -D
Memory Summary:
  private_clean                   0 kB
  private_dirty              56,324 kB
  pss                        60,256 kB
  rss                        83,628 kB
  shared_clean                4,204 kB
  shared_dirty               23,100 kB
  size                      108,156 kB
  swap                           68 kB 

If I shutdown the master process entirely (not just a HUP) then restart it and immediately check a worker before any requests have queued, I get a better story:

[root@thorn script]# ruby memstats.rb 5743
Process:             5743
Command Line:        unicorn_rails worker[4] -c /u/apps/newap/current/lib/unicorn.rb -E production -D
Memory Summary:
  private_clean                   0 kB
  private_dirty              21,572 kB
  pss                        27,735 kB
  rss                        66,296 kB
  shared_clean                2,484 kB
  shared_dirty               42,240 kB
  size                       91,768 kB
  swap                            0 kB

But within 5 seconds of being started up, they're back to ~20MB of shared_clean+shared_dirty.

I suspected that swapping might be causing the problem, but after lowering swappiness and making sure that neither the parent nor child processes are being swapped out (using swapstats.rb), the problem persists.

I don't understand exactly what shared_dirty memory is, and how it gets turned into private memory. I'd also love suggestions for improving the longevity and amount of my shared memory. Thanks!

1
I still don't have a solution to this. I now believe it's a problem of a) 32-bit linux version, and b) a memory-constrained environment (though even with swappiness turned down it still occurs)Kevin
I got this comment from the author of the gist: twitter.com/kenn/status/402832587007086592John Bachir
I also pinged the author of the other thing: twitter.com/dakull/status/403156502598279170John Bachir
Out of curiosity, how does it work if you try the same with Phusion Passenger?Nick Urban
And, for that matter, is it different with Ruby 2.1?Nick Urban

1 Answers

7
votes

According to this answer, which you may have already seen, there is a line that reads:

Note that a "share-able" page is counted as a private mapping until it is actually shared. i.e. if there is only one process currently using libfoo, that library's text section will appear in the process's private mappings. It will be accounted in the shared mappings (and removed from the private ones) only if/when another process starts using that library.

What I would do to test whether you're getting the benefits outlined in this article, is put a 10MB xml file as a literal string directly into your source code. Then, if you fire up 20 workers, you'll be able to see if you're using 200MB of memory, or only 10MB, as is expected with the new garbage collection feature.

UPDATE:

I was looking through the unicorn source and found a reference to this wonderful article.

To summarize, it states that in order to adapt your applications to take advantage of Ruby Enterprise Edition's copy-on-write friendly garbage collector, you must set GC.copy_on_write_friendly to true before you fork.

if GC.respond_to?(:copy_on_write_friendly=)
    GC.copy_on_write_friendly = true
end

Based on your provided unicorn configuration file, it appears to be missing the assignment.

Also, I enjoyed reading these related articles:

According to the fork man page:

Under Linux, fork() is implemented using copy-on-write pages, so the only penalty that it incurs is the time and memory required to duplicate the parent's page tables, and to create a unique task structure for the child.

Since version 2.3.3, rather than invoking the kernel's fork() system call, the glibc fork() wrapper that is provided as part of the NPTL threading implementation invokes clone(2) with flags that provide the same effect as the traditional system call. (A call to fork() is equivalent to a call to clone(2) specifying flags as just SIGCHLD.) The glibc wrapper invokes any fork handlers that have been established using pthread_atfork(3).

And according to the clone man page:

Unlike fork(2), these calls allow the child process to share parts of its execution context with the calling process, such as the memory space, the table of file descriptors, and the table of signal handlers.

So, I'm reading this to mean: linux's fork copy-on-write, which is the feature that unicorn relies on to implement memory sharing, was not implemented until libc 2.2.3 (please, someone correct me if I'm wrong in this interpretation).

To check which version of libc you're running, you can type:

ldd --version

Or, find glibc and run it directly. On my system it found the file at the following location:

locate libc.so
/lib/x86_64-linux-gnu/libc.so.6