I am trying to port some of my R code to Julia; Basically I have rewritten the following R code in Julia:
library(parallel)
eps_1<-rnorm(1000000)
eps_2<-rnorm(1000000)
large_matrix<-ifelse(cbind(eps_1,eps_2)>0,1,0)
matrix_to_compare = expand.grid(c(0,1),c(0,1))
indices<-seq(1,1000000,4)
large_matrix<-lapply(indices,function(i)(large_matrix[i:(i+3),]))
function_compare<-function(x){
which((rowSums(x==matrix_to_compare)==2) %in% TRUE)
}
> system.time(lapply(large_matrix,function_compare))
user system elapsed
38.812 0.024 38.828
> system.time(mclapply(large_matrix,function_compare,mc.cores=11))
user system elapsed
63.128 1.648 6.108
As one can notice I am getting significant speed-up when going from one core to 11. Now I am trying to do the same in Julia:
#Define cluster:
addprocs(11);
using Distributions;
@everywhere using Iterators;
d = Normal();
eps_1 = rand(d,1000000);
eps_2 = rand(d,1000000);
#Create a large matrix:
large_matrix = hcat(eps_1,eps_2).>=0;
indices = collect(1:4:1000000)
#Split large matrix:
large_matrix = [large_matrix[i:(i+3),:] for i in indices];
#Define the function to apply:
@everywhere function function_split(x)
matrix_to_compare = transpose(reinterpret(Int,collect(product([0,1],[0,1])),(2,4)));
matrix_to_compare = matrix_to_compare.>0;
find(sum(x.==matrix_to_compare,2).==2)
end
@time map(function_split,large_matrix )
@time pmap(function_split,large_matrix )
5.167820 seconds (22.00 M allocations: 2.899 GB, 12.83% gc time)
18.569198 seconds (40.34 M allocations: 2.082 GB, 5.71% gc time)
As one can notice I am not getting any speed up with pmap. Maybe somebody can suggest alternatives.
large_matrix
is250000-element Array{Any,1}:
Might this be the problem? – daycasteraddprocs(3)
:4.173674 seconds (22.97 M allocations: 2.943 GB, 14.57% gc time)
and0.795733 seconds (292.07 k allocations: 12.377 MB, 0.83% gc time)
. Also the type oflarge_matrix
isArray{BitArray{2},1}
. – timaddprocs(3) : 5.860692 seconds (22.90 M allocations: 2.938 GB, 13.20% gc time);
@time pmap(function_split,large_matrix ) 27.411076 seconds (40.60 M allocations: 2.094 GB, 3.17% gc time)
– Vitalijspmap
is only useful if each function call takes a considerable amount of time. Depending on what you want to do with the resulting array, you may be interested in@parallel
. – tim