3
votes

I need to run some parallel computations in python. The only compatible approach I can think of is the multiprocess/fork model, which is less than ideal for several reasons:

  • from what I understand, forks in windows are expensive
  • fine-grained process management (signals, ie SIGSTOP/SIGCONT) is clunky (i.e. outside the language)

These are the task requirements:

  • tasks may spawn new tasks
  • tasks must be registered with the task manager
  • tasks do not require shared state
  • tasks must return a value (python object)

The task manager is responsible for scheduling and limiting the number of concurrent tasks. These are the task manager requirements:

  • when a new task is started, the task manager may suspend other tasks based on a predetermined limit
  • when a task returns, the task manager may continue other suspended tasks
  • when the return value of a task is requested, the task manager may reorganize the task priority (prevent deadlocks)

So you see, the task manager doesn't need to be a parallel/concurrent process. Each task may make synchronous calls to the task manager on starting or stopping. Tasks waiting on other tasks may also make synchronous calls.

I can't seem to think of any other approaches:

  • asyncio can start parallel process within a limited pool, but that approach is more suited for data parallelism rather than task pre-emption. Externally pre-empting a task (suspending) isn't compatible with cooperatively programmed events. Correct me if I'm wrong, but while I could use asyncio, it wouldn't make my life easier (an abstraction without benefit) as I would still be required to use processes, and signals on "task-start/stop" events?
  • stackless python might be suitable, but it isn't really python?

Any ideas?

P.S. My end-goal is to automatically parallelize (decorated) function calls. The task manager limits the number of tasks executing in parallel (i.e. recursive functions) to avoid thrashing (fork bombs). I need to use python, even though a though lazy (task waiting), pure (no shared state) and stackless (lightweight threads) language might be more suitable...

1
You can try python threading pymotw.com/2/threadingAnoop
python threads aren't parallel, and I don't need shared state.user19087
Also: "currently, there are no priorities, no thread groups, and threads cannot be destroyed, stopped, suspended, resumed, or interrupted". - so not suitable for these reasons as well.user19087
What about Celery?Vincent
"when a new task is started, the task manager may suspend other tasks based on a predetermined limit" -> "may" means that's not mandatory ? Are your task "long to execute" like > 5mn ? Or are they nearly instant by numerous ? Are you CPU bound ? IO bound ? +1 for @Vincent, what about Celery ?Julien Palard

1 Answers

0
votes

Wow, this question is old and I'm surprised a Stackless Python user hasn't chimed in...

Then again, Stackless Python was/is way ahead of its time and there's very few of us out there putting it into use.

Stackless Python is indeed Python. It is a little more than just Python, but it is Python none the less.

Stackless Python Wiki

I think it would suit your needs very well. It is still up-to-date and maintained with a commit as recent as this month. It's rather solid and has worked wonderfully for my needs.