1
votes

I've read all the threads related to my question (i'm pretty sure i did).

I have written a function that automatically check for the class of the variable if it is numeric then proceeds to replace the NAs with the mean of those variables.

Here's the code :

delna<-function(x){
    for (e in 1:ncol(x)){
        if (class(x[,e])=="numeric"){
            for (e in 1:ncol(x)) {
                x[is.na(x[,e]),e]<-mean(x[,e],na.rm = TRUE)
            }}
    }
}

I get no result when validating the function and when using it on a data frame i get warnings saying :

"In mean.default(x[, e], na.rm = TRUE) : argument is not numeric or logical: returning NA"

Thank you for your help everyone !

3

3 Answers

0
votes

With data.table, assuming the columns you want to treat are "a","b","c" :

library(data.table)
setDT(df)
lapply(c("a","b","c"), function(colname){
  df[is.na(get(colname)), c(colname) := mean(df[[colname]], na.rm = TRUE)]
})

No need for reassignment, you initial dataframe is modified in place

0
votes

This solution is complicated but its use is simple.
I create a generic delna and methods

  1. A default method, to replace the NA's in one numeric vector;
  2. A method for objects of class "matrix";
  3. A method for objects of class "data.frame";
  4. A method for objects of class "list".

Then all that needs to be done is to call delna(object) and everything is automatic.

delna <- function(x, ...) UseMethod("delna")
delna.default <- function(x, ...){
  stopifnot(is.numeric(x))
  mu <- mean(x, na.rm = TRUE)
  x[is.na(x)] <- mu
  x
}
delna.matrix <- function(x, ...){
  x[] <- apply(x, 2, delna)
  x
}
delna.data.frame <- function(x, ...){
  is_num <- sapply(x, is.numeric)
  x[is_num] <- lapply(x[is_num], delna)
  x
}
delna.list <- function(x, ...){
  is_num <- sapply(x, is.numeric)
  x[is_num] <- lapply(x[is_num], delna)
  x
}


delna(letters)
delna(x)
delna(mat)
delna(dat)
delna(as.list(dat))

Test data creation code.

set.seed(1234)
x <- sample(10)
is.na(x) <- sample(10, 4)
mat <- replicate(5, {
  x <- sample(10)
  is.na(x) <- sample(10, 3)
  x
})
dat <- as.data.frame(mat)
0
votes

it turned out i have missed to add return(x). now the function looks like this

    delna<-function(x){
  for (e in 1:ncol(x)){
    if (is.numeric(x[,e])){
      x[is.na(x[,e]),e]<-mean(x[,e],na.rm = TRUE)
      }
  }
  return(x)
}

then i was able to make the modification i wanted by writing data<-delna(data) or assign it to a new df