I am very new to the Julia programming language and I am testing out some Euclidean distance operations I typically perform in other languages. The functions work if calling it serially, but the pmap calls are not returning the desired results. Could someone take a look and let me know if I am going about this the right way? is pmap even the best way to approach this?
using Distributed
#Example data
d1 = randn(50000,3)
d2 = randn(50000,3)
First Function: Euclidean Distance Matrix
function EDM(m1, m2)
n1 = size(m1, 1)
n2 = size(m2,1)
k = size(m1, 2)
Dist = zeros(n1,n2)
for i in 1:n1
for j in 1:n2
dtemp = 0
for a in 1:k
dtemp += (m1[i,a] - m2[j,a]) ^ 2
end
Dist[i,j] = sqrt(dtemp)
end
end
return Dist
end
#pmap call
function pmap_EDM(m1,m2)
return pmap(EDM, m1, m2)
end
Second Function: Minimum Euclidean Distances Unidirectional
function MED(m1, m2)
n1 = size(m1, 1)
n2 = size(m2,1)
k = size(m1, 2)
Dist = zeros(n1,1)
for i in 1:n1
dsum = Inf
for j in 1:n2
dtemp = 0
for a in 1:k
dtemp += (m1[i,a] - m2[j,a]) ^ 2
end
dtemp = sqrt(dtemp)
if dtemp < dsum
dsum = copy(dtemp)
end
end
Dist[i,1] = dsum
end
return Dist
end
#pmap call
function pmap_MED(m1,m2)
return pmap(MED, m1, m2)
end
Third Function: Minimum Euclidean Distances and Corresponding Indices Unidirectional
function MEDI(m1, m2)
n1 = size(m1, 1)
n2 = size(m2,1)
k = size(m1, 2)
Dist = zeros(n1,2)
for i in 1:n1
dsum = Inf
dsum_ind = 0
for j in 1:n2
dtemp = 0
for a in 1:k
dtemp += (m1[i,a] - m2[j,a]) ^ 2
end
dtemp = sqrt(dtemp)
if dtemp < dsum
dsum = copy(dtemp)
dsum_ind = copy(j)
end
end
Dist[i,1] = dsum
Dist[i,2] = dsum_ind
end
return Dist
end
#pmap call
function pmap_MEDI(m1,m2)
return pmap(MEDI, m1, m2)
end
Calling functions
r1 = EDM(d1,d2) #serial
r2 = pmap_EDM(d1,d2)
r3 = MED(d1,d2) #serial
r4 = pmap_MED(d1,d2)
r5 = MEDI(d1,d2) #serial
r6 = pmap_MEDI(d1,d2)
Edited:
The first function should return a simple Euclidean distance matrix with the distances between each row in one array to every row in the second array. The second and third functions are deviations of this to return a subset of those distances based on the minimum distance for each row in one array to every other row in another array (with the third function returning the index position of the minimum distance). The distances do not appear to be calculated correctly and the latter two functions using pmap are returning an nx3 matrix rather than nx1 and nx2 respectively.
Edited 2: example using smaller data set to show results
d1 = randn(5,3)
d2 = randn(5,3)
julia> EDM(d1,d2)
5×5 Array{Float64,2}:
2.60637 3.18867 1.0745 2.60328 1.58608
1.2763 2.31037 3.04379 2.74113 2.00452
1.70024 2.07731 3.12397 2.60893 2.05932
2.44581 1.57345 0.910323 1.08718 0.407675
3.42936 1.13001 2.18345 1.08764 1.70883
julia> pmap_EDM(d1,d2)
5×3 Array{Array{Float64,2},2}:
[0.397928] [2.39283] [0.953501]
[1.06776] [0.815057] [1.87973]
[0.151963] [3.05161] [0.650967]
[0.571021] [0.275554] [0.883151]
[0.109293] [0.635398] [1.58254]
julia> MED(d1,d2)
5×1 Array{Float64,2}:
1.0744953977891307
1.2762979313081781
1.7002448697495505
0.40767454400155695
1.0876399289364607
julia> pmap_MED(d1,d2)
5×3 Array{Array{Float64,2},2}:
[0.397928] [2.39283] [0.953501]
[1.06776] [0.815057] [1.87973]
[0.151963] [3.05161] [0.650967]
[0.571021] [0.275554] [0.883151]
[0.109293] [0.635398] [1.58254]
julia> MEDI(d1,d2)
5×2 Array{Float64,2}:
1.0745 3.0
1.2763 1.0
1.70024 1.0
0.407675 5.0
1.08764 4.0
julia> pmap_MEDI(d1,d2)
5×3 Array{Array{Float64,2},2}:
[0.397928 1.0] [2.39283 1.0] [0.953501 1.0]
[1.06776 1.0] [0.815057 1.0] [1.87973 1.0]
[0.151963 1.0] [3.05161 1.0] [0.650967 1.0]
[0.571021 1.0] [0.275554 1.0] [0.883151 1.0]
[0.109293 1.0] [0.635398 1.0] [1.58254 1.0]
Edited 3: @distributed version of function two
using Distributed
using SharedArrays
#Minimum Euclidean Distances Unidirectional
@everywhere function MD(v1, m2)
n = size(m2, 1)
dsum = Inf
for j in 1:n
dtemp = sqrt((v1[1] - m2[j,1]) ^ 2 + (v1[2] - m2[j,2]) ^ 2 + (v1[3] - m2[j,3]) ^ 2)
if dtemp < dsum
dsum = dtemp
end
end
return dsum
end
function MED(m1, m2)
n1 = size(m1,1)
Dist = SharedArray{Float64}(n1)
m3 = SharedArray{Float64}(m2)
@sync @distributed for k in 1:n1
Dist[k] = MD(m1[k,:], m3)
end
return Dist
end
pmap
, but two other comments about your routines: 1) Julia uses column-major ordering (unlike C), so iterating across columns like you are on the innermost loop is significantly slower than if you rephrased your problem to be iterating across rows. 2) The Distances package is a really well-written Julia package, so you're looking for optimal speed, you should check it out. – Colin T BowersNumber
, eg floats and ints, are immutable in Julia, so an operation likecopy(1)
is allowed, but not particularly meaningful. You can dox = 0 ; y = x ; y = 1
, and you'll note thatx
is still0
.copy
is typically used for mutable types, like arrays. – Colin T Bowers