Julia base has the unique function that returns a vector containing only the unique elements of an array (or any iterable). I was looking for a nonunique function to return an array containing all the elements that appear at least twice in its input. As far as I can tell Julia does not have such a function, which I found a bit surprising.
My first attempt was as follows:
function nonunique(x::AbstractArray)
uniqueindexes = indexin(unique(x),x)
nonuniqueindexes = setdiff(1:length(x),uniqueindexes)
unique(x[nonuniqueindexes])
end
But inspired by Bogumił Kamiński's answer to indices of unique elements of vector in Julia I wrote a second version:
function nonunique(x::AbstractArray{T}) where T
uniqueset = Set{T}()
duplicatedset = Set{T}()
duplicatedvector = Vector{T}()
for i in x
if(i in uniqueset)
if !(i in duplicatedset)
push!(duplicatedset, i)
push!(duplicatedvector, i)
end
else
push!(uniqueset, i)
end
end
duplicatedvector
end
In my tests, this version is about 4 times faster. It has the nice property that the return is ordered in the order that the second (first repeat) of each set of equivalent elements originally appear. I believe that in is faster when checking for membership of a Set than an Array, which accounts for having the two variables duplicatedset and duplicatedvector.
Is it really necessary for me to "roll my own" nonunique function and can the second version be improved?