0
votes

I am curious to know if it is possible to run a python script that calls a function as parallel child processes. I'm not sure I'm using these terms correctly so here's a concept script fashioned from a bash script that does what I'm talking about.

import Zfunctions as Z
reload(Z)

def Parallel():
    statements
    calls to other functions in a general function file Z

#--------------
if '__name__' == '__main__':
    # Running this script in a linux cluster with 8 processing node available
    Parallel() &  #1st process sent to 1st processing node
    Parallell() & #2nd process sent to 2nd node
    .
    .
    .
    Parallell() & #8th process sent to 8th node
    wait

Now I know the ampersand (&) and "wait" are wrong here but in the bash it is the way to sent the process to the background and wait for these processes to finish. My question is now, hopefully clearer: Can this be done in python, and if so how?

Any help is appreciated.

/M

I have gotten some good help. I tested this modification my question above which tries to run 60 jobs that will process a huge amount of data and write the results to disk. All this is in a single python file that combines two for loops and a series of internal functions calls. The script fails and the error output is found below:

import multiprocessing

def Parallel(m,w,PROCESSES):                                                             
plist = {}                                                                           
plist['timespan'] = '2007-2008'                                                      
print 'Creating pool with %d processes\n' % PROCESSES                                
pool = multiprocessing.Pool(PROCESSES)                                               
print 'pool = %s' % pool                                                             

TASKS = [(LRCE,(plist,m,w)),(SRCE,(plist,m,w)),(ALBEDO,(plist,m,w)),                 
         (SW,(plist,m,w)),(RR,(plist,m,w)),(OLR,(plist,m,w)),(TRMM,(plist,w)),       
         (IWP,(plist,m,w)),(RH,(plist,'uth',m,w)),(RH,(plist,200,m,w)),              
         (RH,(plist,400,m,w)),(IWC,(plist,200,m,w)),(IWC,(plist,400,m,w)),           
         (CC,(plist,200,m,w)),(CC,(plist,400,m,w))]                                                                                                        

results = [pool.apply_async(calculate,t) for t in TASKS]                             
print 'Ordered results using pool.apply_async():'                                    
for r in results:                                                                    
    print '\t', r.get()                                                              

#-----------------------------------------------------------------------------------     
if __name__ == '__main__':                                                               
PROCESSES = 8                                                                        
for w in np.arange(2):                                                               
    for m in np.arange(2):                                                           
        Parallel(m,w,PROCESSES) 
#### error message from cluster

Exception in thread Thread-3: Traceback (most recent call last): File "/software/apps/python/2.7.2-smhi1/lib/python2.7/threading.py", line 552, in bootstrap_inner self.run() File "/software/apps/python/2.7.2-smhi1/lib/python2.7/threading.py", line 505, in run self.__target(*self.__args, **self.__kwargs) File "/software/apps/python/2.7.2-smhi1/lib/python2.7/multiprocessing/pool.py", line 313, in _handle_tasks put(task) PicklingError: Can't pickle : attribute lookup __builtin.function failed

1

1 Answers

3
votes

You probably want to look into multiprocessing -- your code could be accomplished as follows:

import multiprocessing

def Parallel(junk):    
    #...snip...

if __name__ == "__main__":
   p = multiprocessing.Pool(8)

   results = p.map(Parallel, range(8))

One warning: Don't try this in an interactive interpreter.