I am trying to run a particle filter with 3000 independent particles. More specifically, I would like to run 3000 (simple) computations in parallel at the same time, so that the computation time remains short.
This task is designed for experimental applications on a laboratory equipment, so it has to be run on a local laptop. I cannot rely on a distant cluster of computers, and the computers that will be used are unlikely to have fancy Nvidia graphic cards. For instance, the current computer I'm working with has an Intel Core i7-8650U CPU and an Intel UHD Graphics 620 GPU.
Using the mp.cpu_count()
from the multiprocessing
Python library tells me that I have 8 processors, which is too few for my problem (I need to run several thousands of processes in parallel). I thus looked towards GPU-based solutions, and especially at PyOpenCL. The Intel UHD Graphics 620 GPU is supposed to have only 24 processors, does it mean I can only use it to run 24 processes at the same time in parallel ?
More generally, is my problem (running 3000 processes in parallel on a simple laptop using Python) realistic, and if yes which software solution would you recommend ?
EDIT
Here is my pseudo code. At each time step i
, I am calling the function posterior_update
. This function uses 3000 times and independently (once for each particle) the function approx_likelihood
, which seems hardly vectorizable. Ideally, I would like these 3000 calls to take place independently and in parallel.
import numpy as np
import scipy.stats
from collections import Counter
import random
import matplotlib.pyplot as plt
import os
import time
# User's inputs ##############################################################
# Numbers of particles
M_out = 3000
# Defines a bunch of functions ###############################################
def approx_likelihood(i,j,theta_bar,N_range,q_range,sigma_range,e,xi,M_in):
return sum(scipy.stats.norm.pdf(e[i],loc=q_range[theta_bar[j,2]]*kk,scale=sigma_range[theta_bar[j,3]])* \
xi[nn,kk]/M_in for kk in range(int(N_range[theta_bar[j,0]]+1)) for nn in range(int(N_range[theta_bar[j,0]]+1)))
def posterior_update(i,T,e,M_out,M_in,theta,N_range,p_range,q_range,sigma_range,tau_range,X,delta_t,ML):
theta_bar = np.zeros([M_out,5], dtype=int)
x_bar = np.zeros([M_out,M_in,2], dtype=int)
u = np.zeros(M_out)
x_tilde = np.zeros([M_out,M_in,2], dtype=int)
w = np.zeros(M_out)
# Loop over the outer particles
for j in range(M_out):
# Computes the approximate likelihood u
u[j] = approx_likelihood(i,j,theta_bar,N_range,q_range,sigma_range,e,xi,M_in)
ML[i,:] = theta_bar[np.argmax(u),:]
# Compute the normalized weights w
w = u/sum(u)
# Resample
X[i,:,:,:],theta[i,:,:] = resample(M_out,w,x_tilde,theta_bar)
return X, theta, ML
# Loop over time #############################################################
for i in range(T):
print('Progress {0}%'.format(round((i/T)*100,1)))
X, theta, ML = posterior_update(i,T,e,M_out,M_in,theta,N_range,p_range,q_range,sigma_range,tau_range,X,delta_t,ML)
multiprocessing
module? Have you tried usingpyopencl
with your CPU to take advantage of vectorization? Can you post a sample of the code you need to run? 3000 processes doesn't sound too much as to require the GPU. – chapelomultiprocessing
toolbox I'm limited to 8 processes in parallel (that is to say my number of CPU cores), is that correct ? Or can I increase that number ? Same goes for my GPU, is there a way to find the max number of parallel processes I can run ? – Camille Gontier