3
votes

In order to test some security software, I need to be able to create a large (configurable) number of new processes (not threads!) in Windows, very quickly, have them exist for a (configurable) period of time, then terminate cleanly. The processes shouldn't do anything at all - just exist for the specified duration.

Ultimately, I want to be able to run something like:

C:\> python process_generate.py --processes=150 --duration=2500

which would create 150 new processes very quickly, keep them all alive for 2500ms, then have them all terminate as quickly as possible.

As a starting point, I ran

from multiprocessing import Process
import os

def f():
    pass

if __name__ == '__main__':
    import datetime
    count = 0
    startime = datetime.datetime.now()
    while True:
        p = Process(target=f)
        p.start()
        p.terminate()
        count += 1
        if count % 1000 == 0:
            now = datetime.datetime.now()
            print "Started & stopped d processes in %s seconds" % (count, str(now-starttime))

and found I could create and terminate about 70 processes/second serially on my laptop, with the created processes terminating straightaway. The approx 70 processes/second rate was sustained over about an hour duration.

When I changed the code to

from multiprocessing import Process
import os
import time

def f_sleep():
    time.sleep(1)

if __name__ == '__main__':
    import datetime
    starttime = datetime.datetime.now()

    processes = []
    PROCESS_COUNT = 100
    for i in xrange(PROCESS_COUNT):
        p = Process(target=f_sleep)
        processes.append(p)
        p.start()
    for i in xrange(PROCESS_COUNT):
        processes[i].terminate()
    now = datetime.datetime.now()
    print "Started/stopped %d processes in %s seconds" % (len(processes), str(now-starttime))

and tried different values for PROCESS_COUNT, I expected it to scale a lot better than it did. I got the following results for different values of PROCESS_COUNT:

  • 20 processes completed in 0.72 seconds
  • 30 processes completed in 1.45 seconds
  • 50 processes completed in 3.68 seconds
  • 100 processes completed in 14 seconds
  • 200 processes completed in 43 seconds
  • 300 processes completed in 77 seconds
  • 400 processes completed in 111 seconds

This is not what I expected - I expected to be able to scale up the parallel process count in a reasonably linear fashion till I hit a bottleneck, but I seem to be hitting a process creation bottleneck almost straightaway. I definitely expected to be able to create something close to 70 processes/second before hitting a process creation bottleneck, based on the first code I ran.

Without going into the full specs, the laptop runs fully patched Windows XP, has 4Gb RAM, is otherwise idle and is reasonably new; I don't think it'd be hitting a bottleneck this quickly.

Am I doing anything obviously wrong here with my code, or is XP/Python parallel process creation really that inefficient on a 12 month old laptop?

5
"new processes" "very quickly" "Windows" ... I think I see the problem... - Ignacio Vazquez-Abrams
Doesn't explain how I was able to create 70 processes/second when they were being created serially though - monch1962

5 Answers

7
votes

Well, Windows process management doesn't really scale well. The more processes there are, the longer it takes to insert a new one into scheduling.

Now compare this with other OS kernels, for example Linux, where process creation is practically O(1) (constant time) since kernel 2.6.8 (when the scheduler capable of this was introduced).

Note that I'm not trying to sell you Linux here. I just suggest you try out your program on a different OS to see for yourself.

3
votes

After profiling and testing a bunch of different scenarios, I found that it's simply far faster to be generating and killing single processes under Windows, rather than generating N processes at once, killing all N, and restarting N again.

My conclusion is that Windows keeps enough resource available to be able to start 1 process at a time quite quickly, but not enough to start >1 new concurrent processes without considerable delay. As others have said, Windows is slow at starting new processes, but apparently the speed degrades semi-geometrically with the number of concurrent processes already running on the system - starting a single process is quite fast, but when you're kicking off multiple processes you hit problems. This applies regardless of the number of CPUs that exist, how busy the machine is (typically <5% CPU in my testing), whether Windows is running on a physical server or virtual, how much RAM is available (I tested with up to 32Gb RAM, with ~24Gb free), ... - it simply seems to be a limitation of the Windows OS. When I installed Linux on the same hardware, the limitation went away (as per Xavi's response) and we were able to start many processes concurrently, very quickly.

2
votes

If I remember correctly, as opposed to Linux, Windows was never designed to start many processes quickly. It's just not what the designers thought you would be doing - whereas on linux, with stuff like inetd etc., it is a common enough operation model to warrant optimization - so, process creation was optimized like hell.

2
votes

I've tested your code in a Ubuntu 11.04 Dell Precision with 4Gb RAM with this result:

Started/stopped 100 processes in 0:00:00.051061 seconds
Started/stopped 150 processes in 0:00:00.094802 seconds
Started/stopped 200 processes in 0:00:00.153671 seconds
Started/stopped 300 processes in 0:00:00.351072 seconds
Started/stopped 400 processes in 0:00:00.651631 seconds
Started/stopped 470 processes in 0:00:01.009148 seconds
Started/stopped 1000 processes in 0:00:02.532036 seconds
Started/stopped 10000 processes in 0:00:29.677061 seconds

There was at least a 10% of variability with each execution with the same number of processes, hope this is usefull, within one second my computer executed nearly 500 processes with your code.

0
votes

I want to argue, that in Linux there are difficulties with the creation of many Python processes too. After 500 times of running p.start() it becomes really slow.

Sometimes I need to create thousands of the processes that work for a long time.

In examples above there is not PROCESS_COUNT amount of the alive processes in one moment, cause they start finishing work after 1 sec. So in case of creating 1000 processes by 2 secs above, more than half of processes finished till the end of the creation procedure.

from multiprocessing import Process
def sample():
        sleep(13)
start = time()
for i in range(1500):
    p = Process(target=sample)
    p.daemon = True
    p.start()
 end = time()
 print end - start 

I tried a 140core server with SUSE ENTERPRISE and on my laptop with Ubuntu - dynamics is the same (server results):

500 processes start  - 1.36 s
1000 processes start - 9.7 s
1500 processes start - 18.4 s
2000 processes start - 24.3 s
3000 processes start - 43.3 s

Its because of this call before fork. It takes longer for every new child process

def _cleanup():
    # check for processes which have finished
    for p in list(_current_process._children):
        if p._popen.poll() is not None:
            _current_process._children.discard(p)

As I remember, if processes have manager.Value and are little heavier - it takes tenth GBs of RAM and starts a little longer.