So first of all, I removed the last two lines of plot_probs
- plotting isn't really a great thing to benchmark I think, and it's largely beyond my (or your) control - could try PyPlot if its a real factor. I also timed plot_probs
a few times to see how much time is spent compiling it the first time:
**********elapsed time: 1.071184218 seconds (473218720 bytes allocated, 26.36% gc time)
**********elapsed time: 0.658809962 seconds (452017744 bytes allocated, 40.29% gc time)
**********elapsed time: 0.660609145 seconds (452017680 bytes allocated, 40.45% gc time)
So there is a 0.3s penalty paid once. Moving on to the actual algorithm, I used the in built profiler (e.g. @profile plot_probs(norm_array, 0.25, [1:3], 10, 3)
), which revealed all of the time (essentially) is spent here:
- ~80%:
[ push!(dist, euclidean(set_array[i,:][:], input_array)) for i in 1:size(set_array, 1) ]
- ~20%:
[ d[i] = get(d, i, 0) + 1 for i in labels[sortperm(dist)][1:k] ]
Using array comprehensions like that is not idiomatic Julia (or Python for that matter). The first is also slow because all that slicing makes many copies of the data. I'm not an expert with Distances.jl
, but I think you can replace it with
dist = Distances.colwise(Euclidean(), set_array', input_array)
d = Dict{Int,Int}()
for i in labels[sortperm(dist)][1:k]
d[i] = get(d, i, 0) + 1
end
which gave me
**********elapsed time: 0.731732444 seconds (234734112 bytes allocated, 20.90% gc time)
**********elapsed time: 0.30319397 seconds (214057552 bytes allocated, 37.84% gc time)
More performance could be extracted by doing the transpose once in mass_kNN
, but that required touching a few too many places and this post is long enough. Trying to micro-optimize it led me to using
dist=zeros(size(set_array, 1)
@inbounds for i in 1:size(set_array, 1)
d = 0.0
for j in 1:length(input_array)
z = set_array[i,j] - input_array[j]
d += z*z
end
dist[i] = sqrt(d)
end
which gets it to
**********elapsed time: 0.646256408 seconds (158869776 bytes allocated, 15.21% gc time)
**********elapsed time: 0.245293449 seconds (138817648 bytes allocated, 35.40% gc time)
so took about half the time off - but not really worth it, and less flexible (e.g. what if I wanted L1). Other code review points (unsolicited, I know):
- I find
Vector{Float64}
and Matrix{Float64}
much easier on the eye than Array{Float64,1}
and Array{Float64,2}
and less likely to be confused.
Float64[]
is more usual than Array(Float64, 0)
Int64
can just be written as Int
, since it doesn't need to be a 64-bit integer.