1
votes

How should I perform missing imputation in DataFrame.jl? E.g., for a given DataFrame, how to turn all missings to 0. Thanks in advance!

1
Just walk over to Bogumil's office and ask him :)Nils Gudat
If I am to answer the question anyway it is better to have the answer recorded on SO for the future :).Bogumił Kamiński

1 Answers

4
votes

use coalesce and broadcasting. So assuming your data frame is stored in df variable then just do:

df .= coalesce.(df, 0)

Now, if you wanted to perform this substitution only in selected columns then do:

@. df[!, cols] = coalesce(df[!, cols], 0)

where cols is a column selector.

An alternative way to achieve this is to use transform!:

transform!(df, cols .=> ByRow(x -> coalesce(x, 0)), renamecols=false)

where cols is your column selector. Use names(df) for cols is you want to do the imputation in all columns of the DataFrame.

This approach is a bit more verbose in this case, but it is more flexible in general.