1
votes

I have a number of csv files and my goal is to find the number of complete cases for a file or set of files given by id argument. My function should return a data frame with column id specifying the file and column obs giving the number of complete cases for this id. However, my function overwrites the previous value of nobs in each loop and the resulting data frame gives me only its last value. Do you have any idea how to get the value of nobs for each value of id?

  myfunction<-function(id=1:20) {
  files<-list.files(pattern="*.csv")
  myfiles = do.call(rbind, lapply(files, function(x) read.csv(x,stringsAsFactors = FALSE)))  

   for (i in id) {  
   good<-complete.cases(myfiles)
   newframe<-myfiles[good,]
   cases<-newframe[newframe$ID %in% i,]
   nobs<-nrow(cases)  
  }
  clean<-data.frame(id,nobs)
  clean
 }

Thanks.

1
What is the expected output? Also, read about XY problem. - zx8754
I tried to explain better in my post. - Simona Aleksandrova Atanasova
I don't know R, but you need to add nobs to a list or something similar, to be able to use that list as an argument to your data.frame function. - zoom

1 Answers

0
votes

We can do all inside lapply(), something like below (not tested):

myfunction <- function(id = 1:20) {
  files <- list.files(pattern = "*.csv")[id]

  do.call(rbind,
          lapply(files, function(x){
            df <- read.csv(x,stringsAsFactors = FALSE)
            df <- df[complete.cases(df), ]
            data.frame(ID=x,nobs=nrow(df))
            }
            )
          )  
}