I have set of programs working together with shared memory (ipc) ~ 48GB.
Programs running in Linux 3.6.0-rc5, written plain C, compiled gcc load average on main computer is 6.0 jumping to 16.0 every 10 seconds (24 cores)
One proxy receiving data from other machines by 0mq (3.2.3, ~1000 msgs/s from 12 machines in same network), writing into shared memory Many (<50) workers read this data and do some calculations.
Proxy using around 20% cpu Every worker using 1% CPU jumping 10% periodically.
All programs written such way when all allocations done in init() - called when program start, all free done in destroy() - called before exit
Repetitive code not using any malloc/calloc/free at all.
But both programs still leaks. Around 120-240 bytes per minute. This isnt much - memory exhausted in 7-8 days and i just start/stop process, but those leaked bytes eating my mind every time monitoring app reporting me about this restart :)
Bad things - i cant run valgrind due using shared memory - its just stopping on allocating/attaching shared memory and then everything start crushed.
Trying to find this leak i've made stripped version of proxy - no leaks, but i cant feed it with same amount of data.
When running under gdb still no leaks, but speed dropped around 2/3 - so may be its not as fast to reproduce this error.
So possible leaks are in:
- my code. but there is no malloc/calloc. Just pointers +-, memcpy, memcmp
- some standard library. glibc? syslog?
- 0mq on working with many sources (don't think 1k/msgs per seconds is too much traffic)
Is any other tools/libs/hacks exists that can help in such situation?
Edit: Shivan Raptor asked about code. Repetitive part is 5k lines of maths. Without any allocations as i mentioned.
But start, stop and repetitive entering here:
int main(int argc, char **argv)
{
ida_init(argc, argv, PROXY);
ex_pollponies(); // repetive
ida_destroy();
return(0);
}
// with some cuttings
int ex_pollponies(void)
{
int i, rc;
unsigned char buf[90];
uint64_t fos[ROLLINGBUFFERSIZE];
uint64_t bhs[ROLLINGBUFFERSIZE];
int bfcnt = 0;
uint64_t *fo;
uint64_t *bh;
while(1) {
rc = zmq_poll(ex_in->poll_items, ex_in->count, EX_POLL_TIMEOUT);
for (i=0; i < ex_in->count; i++) {
if (ex_in->poll_items[i].revents & ZMQ_POLLIN) {
if (zmq_recv(ex_in->poll_items[i].socket, &buf, max_size, 0) == 0)
continue;
fo = &fos[bfcnt];
bh = &bhs[bfcnt];
bfcnt++;
if (bfcnt >= ROLLINGBUFFERSIZE)
bfcnt = 0;
memcpy(fo, (void *)&buf[1], sizeof(FRAMEOBJECT));
memcpy(bh, &buf[sizeof(FRAMEOBJECT)+1], sizeof(FRAMEHASH));
// then store fo, bh into shared memory, with some adjusting and checkings
// storing every second around 1000 msgs 16 bytes each. But leaking is only 200 bytes per minute.
}
}
}
}
edit2:
I finally make valgrind working - just make use part of data (6GB) and it finally passed. And not find any leaks. But, in process of working it takes 100% cpu and definitely my program not handled all incoming data - its not working on full load. This half confirmed my lasthope guess - leaking is on data exchange block. I find info about mtrace (part of libc) It helped me to track ADDRESS of leaking - its outside my code, in one of threads. The only threads in my code is created by zeromq. Then i start playing with options for sockets (increase hwm, buffers) and speed of leaking decreased, but not completely gone even on absurdly big values :(
So, now i 95% sure its zeromq leaking. Try to find answer in their mail list.