1
votes

When I try to retrieve the maximums of differences of columns in a DataFrame I get an error. What is wrong?

using DataFrames

a = [2,4,10,4,8,8]
b = [5,9,7,2,8,7]
c = [2,9,7,6,8,1]

df = DataFrame(A = a, B = b, C = c)
df[2,:A] = NA
df[3,:C] = NA

ab=df[:A] - df[:B]
bc=df[:B] - df[:C]
ac=df[:A] - df[:C]

df[:max] = max(ab, bc, ac)

println(df)

=> LoadError: MethodError: no method matching isless(::DataArrays.DataArray{Int64,1}, ::Array{Any,1})

Doing the maximum of either df[:max] = max(ab, bc) or df[:max] = max(a, b, c) works as expected.

Can anybody clarify what's going on? Thank you!

1

1 Answers

2
votes

Pay attention to the return types:

julia> typeof(ab)
DataArrays.DataArray{Int64,1}

julia> typeof(bc)
DataArrays.DataArray{Int64,1}

julia> typeof(ac)
DataArrays.DataArray{Int64,1}

julia> typeof(max(ab, bc))
Array{Any,1}

That last one is the issue. Julia is complaining that it cannot compare a DataArray{Int64,1} with an Array{Any,1}. This does not happen with the original Int arrays because they have no NA. As noted in the DataFrames docs, NA poisons array operations.

Observe that the following code works fine because it has no NA, so the return type of max is fully specified:

df2 = DataFrame(A = a, B = b, C = c)
df2[:max] = max(a, b, c)
typeof(df2[:max]) ### DataArrays.DataArray{Int64,1}

Your best option is to impute or purge the NA from your DataFrame before computing maxima. An easy way to purge NA by row is

df3 = DataFrames.na_omit(df)[1]