4
votes

I have set of programs working together with shared memory (ipc) ~ 48GB.

Programs running in Linux 3.6.0-rc5, written plain C, compiled gcc load average on main computer is 6.0 jumping to 16.0 every 10 seconds (24 cores)

One proxy receiving data from other machines by 0mq (3.2.3, ~1000 msgs/s from 12 machines in same network), writing into shared memory Many (<50) workers read this data and do some calculations.

Proxy using around 20% cpu Every worker using 1% CPU jumping 10% periodically.

All programs written such way when all allocations done in init() - called when program start, all free done in destroy() - called before exit

Repetitive code not using any malloc/calloc/free at all.

But both programs still leaks. Around 120-240 bytes per minute. This isnt much - memory exhausted in 7-8 days and i just start/stop process, but those leaked bytes eating my mind every time monitoring app reporting me about this restart :)

Bad things - i cant run valgrind due using shared memory - its just stopping on allocating/attaching shared memory and then everything start crushed.

Trying to find this leak i've made stripped version of proxy - no leaks, but i cant feed it with same amount of data.

When running under gdb still no leaks, but speed dropped around 2/3 - so may be its not as fast to reproduce this error.

So possible leaks are in:

  • my code. but there is no malloc/calloc. Just pointers +-, memcpy, memcmp
  • some standard library. glibc? syslog?
  • 0mq on working with many sources (don't think 1k/msgs per seconds is too much traffic)

Is any other tools/libs/hacks exists that can help in such situation?

Edit: Shivan Raptor asked about code. Repetitive part is 5k lines of maths. Without any allocations as i mentioned.

But start, stop and repetitive entering here:

int main(int argc, char **argv)
{
    ida_init(argc, argv, PROXY);
    ex_pollponies(); // repetive
    ida_destroy();
    return(0);
}


// with some cuttings

int ex_pollponies(void)
{
  int i, rc;
  unsigned char buf[90];
  uint64_t fos[ROLLINGBUFFERSIZE];
  uint64_t bhs[ROLLINGBUFFERSIZE];
  int bfcnt = 0;

  uint64_t *fo;
  uint64_t *bh;

  while(1) {
    rc = zmq_poll(ex_in->poll_items, ex_in->count, EX_POLL_TIMEOUT);
    for (i=0; i < ex_in->count; i++) {
      if (ex_in->poll_items[i].revents & ZMQ_POLLIN) {

        if (zmq_recv(ex_in->poll_items[i].socket, &buf, max_size, 0) == 0)
          continue;
        fo = &fos[bfcnt];
        bh = &bhs[bfcnt];
        bfcnt++;
        if (bfcnt >= ROLLINGBUFFERSIZE)
          bfcnt = 0;

        memcpy(fo, (void *)&buf[1], sizeof(FRAMEOBJECT));
        memcpy(bh, &buf[sizeof(FRAMEOBJECT)+1], sizeof(FRAMEHASH));

        // then store fo, bh into shared memory, with some adjusting and checkings
        // storing every second around 1000 msgs 16 bytes each. But leaking is only 200 bytes per minute.

      }
     }

  }
}

edit2:

I finally make valgrind working - just make use part of data (6GB) and it finally passed. And not find any leaks. But, in process of working it takes 100% cpu and definitely my program not handled all incoming data - its not working on full load. This half confirmed my lasthope guess - leaking is on data exchange block. I find info about mtrace (part of libc) It helped me to track ADDRESS of leaking - its outside my code, in one of threads. The only threads in my code is created by zeromq. Then i start playing with options for sockets (increase hwm, buffers) and speed of leaking decreased, but not completely gone even on absurdly big values :(

So, now i 95% sure its zeromq leaking. Try to find answer in their mail list.

1
Look at the documentation of library functions that you use. Do any of them allocate memory for you and expect you to free it?Shahbaz
What makes you think there's a leak? Statistics may be misleading - e.g. if you allocate memory on init, and touch it only later, pages are allocated only when touched, which looks like allocations are being made.ugoren
ugoren, when using top, or ps -o pid,args,vsz,rss,%cpu,%mem -C proxy for example, rss increased by a little every time.Roman Golomidov
RSS is the resident size (physical memory). You cannot tell whether your application has made any allocation by looking at its value. It can go either up or down as the OS sees fit.n. 1.8e9-where's-my-share m.

1 Answers

3
votes

If valgrind doesn't solve it - you can try tracking memory allocation yourself.

There are two ways - replace your calls to malloc with calls to your own versions of malloc and free, and also pass in to those functions some kind of identifier, like FILE and LINE, or you can pass in the name of the system that is allocating.

In non-memory leak detection mode you pass through directly to malloc and free, and in memory leak detection mode you first log the alloc and free calls and then call through to malloc and free. When the program finishes you match up the allocations and frees, and you'll see where you're leaking memory.

You can do this with macros, so your regular build won't be slowed down.

You won't catch leaks from client libraries that you can't recompile yourself.

Another way is to use gcc's -wrap flag to at link time have gcc call your version of malloc/free instead of the glibc one. See this post:

Create a wrapper function for malloc and free in C

The advantage of this is that you will be able to log allocations in client libraries as well. The disadvantage is that you are limited to the same function signature, so you won't be able to get FILE and LINE in your leak checker.

If this was C++ you could overload global operator new.