1
votes

I wonder if i could achieve something like the following logic: given a set of jobs to be done fold_num and a limit number of worker processes, say work_num, i hope to run work_num processes in parallel until all jobs fold_num are done. Finally, there is some other processing on the results of all these jobs. We can assume fold_num is always several times of work_num.

I haven't got the following snippet working so far, with tips from How to wait in bash for several subprocesses to finish and return exit code !=0 when any subprocess ends with code !=0?

#!/bin/bash
worker_num=5                                                                                                                                                                                 
fold_num=10                                                                                                                                                                              

pids=""                                                                                                                                                                                   
result=0                                                                                                                                                                                  
for fold in $(seq 0 $(( $fold_num-1 ))); do                                                                                                                                               
    pids_idx=$(( $fold % ${worker_num} ))                                                                                                                                                    
    echo "pids_idx=${pids_idx}, pids[${pids_idx}]=${pids[${pids_idx}]}"                                                                                                                   
    wait ${pids[$pids_idx]} || let "result=1"                                                                                                                                             
    if [ "$result" == "1" ]; then                                                                                                                                                         
        echo "some job is abnormal, aborting"                                                                                                                                             
        exit                                                                                                                                                                              
    fi                                                                                                                                                                                    

    cmd="echo fold$fold"    # use echo as an example, real command can be time-consuming to run                                                                                                                                                              
    $cmd &                                                                                                                                                                                
    pids[${pids_idx}]="$!"                                                                                                                                                                

    echo "pids=${pids[*]}"                                                                                                                                                                
done                                                                                                                                                                                      


# when the for-loop completes, do something else...                                                                                                                                       

The output looks like:

pids_idx=0, pids[0]=
pids=5846
pids_idx=1, pids[1]=
fold0
pids=5846 5847
fold1
pids_idx=2, pids[2]=
pids=5846 5847 5848
fold2
pids_idx=3, pids[3]=
pids=5846 5847 5848 5849
fold3
pids_idx=4, pids[4]=
pids=5846 5847 5848 5849 5850
pids_idx=0, pids[0]=5846
fold4
./test_wait.sh: line 12: wait: pid 5846 is not a child of this shell
some job is abnormal, aborting

Question: 1. Seems the pids array has recorded correct process IDs, but failed to be 'wait' for. Any ideas how to fix this? 2. Do we need to use wait after the for-loop? if so, what to do after the for-loop?

1
Can you use a tool like GNU Parallel? gnu.org/software/parallelBarmar
GNU Parallel is designed for this... parallel -j $work_num process ::: {1..$fold_num}Mark Setchell
my real command has more complicated logic, so not sure if parallel can achieve this: say i need to ensure that fold0 and fold5 are processed by the 1st worker process (whatever that might be), fold1 and fold6 by the 2nd, ...fold4 and fold9 by the 5th. how can i do so using parallel?galactica

1 Answers

0
votes

alright, I guess I got a working solution with tips from folks on 'parallel'.

export worker_names=("foo", "bar")                                                                                                                                                                                   
export worker_num=${#worker_names[@]}        

function some_computation {                                                                                                                                                                                          
    fold=$1                                                                                                                                                                                                          
    cmd="..."     #involves worker_names and fold                                                                                                                                                                    
    echo $cmd; $cmd                                                                                                                                                                                                  
}                                                                                                                                                                                                                    
export -f some_computation # important, to make this function visible to subprocesses                                                                                                                                

for fold in $(seq 0 $(( $fold_num-1 ))); do                                                                                                                                                                          
    sem -j $worker_num some_computation $fold                                                                                                                                                                        
done                                                                                                                                                                                                                 

sem --wait    # wait for all jobs to complete  

# do something below

Couple of things here:

  1. I haven't got parallel working because of the post-computation processing i need to do after those parallel jobs. The parallel version i tried failed to wait for job completion. So i used GNU sem which stands for semaphore.
  2. exporting variables is crucial for the computation function to access to in this situation. Otherwise those global variables are invisible.
  3. exporting the computation function is also necessary for the same reason. Notice the -f option.

  4. sem --wait perfectly fulfills the needs to wait for parallel jobs.

HTH.