I am using concurrent.futures module to do multiprocessing and multithreading. I am running it on a 8 core machine with 16GB RAM, intel i7 8th Gen processor. I tried this on Python 3.7.2 and even on Python 3.8.2
import concurrent.futures
import time
takes list and multiply each elem by 2
def double_value(x):
y = []
for elem in x:
y.append(2 *elem)
return y
multiply an elem by 2
def double_single_value(x):
return 2* x
define a
import numpy as np
a = np.arange(100000000).reshape(100, 1000000)
function to run multiple thread and multiple each elem by 2
def get_double_value(x):
with concurrent.futures.ThreadPoolExecutor() as executor:
results = executor.map(double_single_value, x)
return list(results)
code shown below ran in 115 seconds. This is using only multiprocessing. CPU utilization for this piece of code is 100%
t = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
my_results = executor.map(double_value, a)
print(time.time()-t)
Below function took more than 9 min and consumed all the Ram of system and then system kill all the process. Also CPU utilization during this piece of code is not upto 100% (~85%)
t = time.time()
with concurrent.futures.ProcessPoolExecutor() as executor:
my_results = executor.map(get_double_value, a)
print(time.time()-t)
I really want to understand:
1) why the code that first split do multiple processing and then run tried multi-threading is not running faster than the code that runs only multiprocessing ?
(I have gone through many post that describe multiprocessing and multi-threading and one of the crux that I got is multi-threading is for I/O process and multiprocessing for CPU processes ? )