2
votes

I have the following function that is executed on a Thread created with _beginthreadex or CreateThread:

static volatile LONG g_executedThreads = 0;
void executeThread(int v){
   //1. leaks: time_t tt = _time64(NULL);
   //2. leaks: FILETIME ft; GetSystemTimeAsFileTime(&ft);
   //3. no leak: SYSTEMTIME stm; GetSystemTime(&stm);

   InterlockedAdd(&g_executedThreads, 1); // count nr of executions
 }

When I uncomment any of lines 1. (crt call) or 2. (win 32 api call) the thread leaks and next calls of _begintreadex will fail( GetLastError -> return error (8) -> Not enough storage is available to process this command). Memory reported by Process Explorer, when _beginthreadex starts to fail: Private 130 Mb, Virtual 150 Mb.

But if I uncomment only line 3. (other win 32 api call) no leak happens, and no fail after 1 million of threads. Here Memory reported is Private 1.4 Mb, Virtual 25 Mb. And this version ran very fast (20 secs for 1 million threads vs the first one that took 60 secs for 30000).

I've tested (see here the test code ) with Visual Studio 2013, compiled x86 (debug and release) and ran on Win 8.1 x64; After creating 30000 of threads _beginthreadex starts failing (most of the calls); I want to mention that simultaneous running threads are under 100.

Updated 2:

My assumption of max 100 threads was based on console output (scheduled is aprox equal with completed) and Process Explorer in Threads Tab did not report more then 10 threads_) Here is the console output (no WaitForSingleObject, original code):

step:0, scheduled:1, completed:1
step:5000, scheduled:5001, completed:5000
...
step:25000, scheduled:25001, completed:24999
step:30000, scheduled:30001, completed:30001
 _beginthreadex failed. err(8); errno(12). exiting ...
step:31701, scheduled:31712, completed:31710

rerun loop:
step:0, scheduled:31713, completed:31711
_beginthreadex failed. err(8); errno(12). exiting ...
step:6, scheduled:31719, completed:31716

Based on @SHR & @HarryJohnston suggestion I've scheduled 64 threads at once, and wait all to complete, (see updated code here), but the behaviour is same. Note I've tried single thread once, but the fail happens sporadic. Also the Reserved Stack size is 64K! Here is the new schedule function:

static unsigned int __stdcall _beginthreadex_wrapper(void *arg) {
    executeThread(1);
    return 0;
}
const int maxThreadsCount = MAXIMUM_WAIT_OBJECTS;
bool _beginthreadex_factory(int& step) {
    DWORD lastError = 0;

    HANDLE threads[maxThreadsCount];
    int threadsCount = 0;
    while (threadsCount < maxThreadsCount){
        unsigned int id;
        threads[threadsCount] = (HANDLE)_beginthreadex(NULL,
            64 * 1024, _beginthreadex_wrapper, NULL, STACK_SIZE_PARAM_IS_A_RESERVATION, &id);
        if (threads[threadsCount] == NULL) {
            lastError = GetLastError();
            break;
        }
        else threadsCount++;
    }

    if (threadsCount > 0) {
        WaitForMultipleObjects(threadsCount, threads, TRUE, INFINITE);
        for (int i = 0; i < threadsCount; i++) CloseHandle(threads[i]);
    }

    step += threadsCount;
    g_scheduledThreads += threadsCount;

    if (threadsCount < maxThreadsCount) {
        printf("    %03d sec: step:%d, _beginthreadex failed. err(%d); errno(%d). exiting ...\n", getLogTime(), step, lastError, errno);
        return false;
    }
    else return true;
}

Here is what is printed on Console:

000 sec: step:6400, scheduled:6400, completed:6400
003 sec: step:12800, scheduled:12800, completed:12800
007 sec: step:19200, scheduled:19200, completed:19200
014 sec: step:25600, scheduled:25600, completed:25600
022 sec: step:32000, scheduled:32000, completed:32000
023 sec: step:32358, _beginthreadex failed. err(8); errno(12). exiting ...
sleep 5 seconds
028 sec: step:32358, scheduled:32358, completed:32358
try to create 2 more times
028 sec: step:32361, _beginthreadex failed. err(8); errno(12). exiting ...
032 sec: step:32361, scheduled:32361, completed:32361
rerun loop: 1
036 sec: step:3, _beginthreadex failed. err(8); errno(12). exiting ...
sleep 5 seconds
041 sec: step:3, scheduled:32364, completed:32364
try to create 2 more times
041 sec: step:5, _beginthreadex failed. err(8); errno(12). exiting ...
045 sec: step:5, scheduled:32366, completed:32366
rerun loop: 2
056 sec: step:2, _beginthreadex failed. err(8); errno(12). exiting ...
sleep 5 seconds
061 sec: step:2, scheduled:32368, completed:32368
try to create 2 more times
061 sec: step:4, _beginthreadex failed. err(8); errno(12). exiting ...
065 sec: step:4, scheduled:32370, completed:32370

Any suggestion/info is welcome. Thanks.

2
I'm not a Windows expert but the problem is almost surely in code you have not shown. I see that you have posted a link to your complete test program, but would you mind, first, cutting that program down to the smallest possible program that still exhibits the memory leak, and, second, editing that into the question? (Unless you discover the problem during the cutting-down process, which happens.)zwol
If InterlockedAdd function can terminate the thread in any way, than that could cause some memory leaks.smerlin
It's not a Win32 API. It is a C library function.user207421
SHR is right. It isn't a leak, you just need to throttle the thread creation to limit the number of threads running simultaneously. (You assert that no more than 100 threads are running at a time, but I don't see any way to confirm that.)Harry Johnston
@HarryJohnston My assertion was based on what was print on console. scheduled and completed values do not differ to much.Nicolae Dascalu

2 Answers

1
votes

I guess you get it wrong. Take a look at this code:

int thread_func(void* p)
{
     Sleep(1000);
     return 0;
}
int main()
{
    LPTHREAD_START_ROUTINE s = (LPTHREAD_START_ROUTINE)&thread_func;
    for(int i=0;i<1000000;i++)
    {
        DWORD id;
        HANDLE h = CreateThread(NULL,0, s,NULL,0,&id); 
        WaitForSingleObject(h,INFINITE);
    }
    return 0;
}

A leaking thread will leak just because you calling it, so the wait doesn't chang a thing, but when you look at this in performance monitor, you'll see all lines are almost constant.

Now ask yourself, what will happen when I remove the WaitForSingleObject?

The creation of threads run much faster then the threads, so you reach the threads limit per proccess, or memory limit per process. Note that if you are compiling for x86, memory is limited to 4GB but only 2GB is used for user mode memory and the other 2GB used for kernel mode memory. if you are using the default stack size (1MB) for thread, and the rest of the program doesn't use memory at all (it's never happen, since you have code...), then you are limited to 2000 threads. after the 2GB finished you can't create more threads until previous threads will over.

So, my conclusion is that you creating threads and don't wait, and after some period, no memory left for more threads.

You can check if this is the case with performance monitor and check the max threads per your process.

1
votes

After uninstall the antivirus the failure could not be reproduced (even the code is run as fast as for the other scenario 3.).