Multithreading inside Multiprocessing in Python

Question

I am using concurrent.futures module to do multiprocessing and multithreading. I am running it on a 8 core machine with 16GB RAM, intel i7 8th Gen processor. I tried this on Python 3.7.2 and even on Python 3.8.2

import concurrent.futures
import time

takes list and multiply each elem by 2

def double_value(x):
  y = []
  for elem in x:
    y.append(2 *elem)
  return y

multiply an elem by 2

def double_single_value(x):
  return 2* x

define a

import numpy as np
a = np.arange(100000000).reshape(100, 1000000)

function to run multiple thread and multiple each elem by 2

 def get_double_value(x):
  with concurrent.futures.ThreadPoolExecutor() as executor:
    results = executor.map(double_single_value, x)
  return list(results)

code shown below ran in 115 seconds. This is using only multiprocessing. CPU utilization for this piece of code is 100%

t = time.time()

with concurrent.futures.ProcessPoolExecutor() as executor:
  my_results = executor.map(double_value, a)
print(time.time()-t)

Below function took more than 9 min and consumed all the Ram of system and then system kill all the process. Also CPU utilization during this piece of code is not upto 100% (~85%)

t = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
  my_results = executor.map(get_double_value, a)

print(time.time()-t)

I really want to understand:

1) why the code that first split do multiple processing and then run tried multi-threading is not running faster than the code that runs only multiprocessing ?

(I have gone through many post that describe multiprocessing and multi-threading and one of the crux that I got is multi-threading is for I/O process and multiprocessing for CPU processes ? )

2) Is there any better way of doing multi-threading inside multiprocessing for max utilization of allotted core(or CPU) ?

3) Why that last piece of code consumed all the RAM ? Was it due to multi-threading ?

@Paul My bad, that was by mistake, I have corrected it. Please check now — learner
What version of python are you using? The ThreadPoolExecutor has changed how many workers it's willing to spawn by default in 3.8. — EnticingCanine
For a CPU-bound task, a process per CPU is as efficient as you'll get, assuming the CPU-bound task is large enough to warrant the overhead of starting the processes and transferring the data to them. And threads don't help at all in CPython for a CPU-bound task due to the GIL. — Mark Tolonen

lenik lenik · Accepted Answer · 2020-06-19T15:37:28

As you say: "I have gone through many post that describe multiprocessing and multi-threading and one of the crux that I got is multi-threading is for I/O process and multiprocessing for CPU processes".

You need to figure out, if your program is IO-bound or CPU-bound, then apply the correct method to solve your problem. Applying various methods at random or all together at the same time usually makes things only worse.