0
votes

I have a directory called "specdata" that contains csv files (such as 001.csv,002.csv,...,332.csv). Now I want my function to read all the files in this directory and return a data.frame where the first column is the name of the file and the second column is the number of complete cases.

for example:

id nobs
1  108
2  345
...
etc

Now, I wrote this function that reads all the files in "specdata" directory and generates the sum of complete cases in each file. But I do not know how to put each no. generated by "nobs" from the loop into the new data.frame in this format:

id  nobs
1   108
2   345
...
...
332 16

My function:

complete <- function(directory, id = 1:332) {

for(i in 1:332)
  {
    if(i<10)
      {

      path<-paste(directory,"/00",id[i],".csv",sep="")
      }
    if(i>9 & i<100)
      {

      path<-paste(directory,"/0",id[i],".csv",sep="") 
      }
    if(i>99 & i<333)
      {

      path<-paste(directory,"/",id[i],".csv",sep="") 
      }  

    mydata<-read.csv(path)
    #nobs<-nrow(na.omit(mydata))
    nobs<-sum(complete.cases(mydata))

  }


}

the problem is that "nobs" dynamically gets created row-by-row in the for loop and I want to put the entire list of "nobs" for all the files into data.frame. I have tried lot of ways but am unable to put entire list of "nobs" into the data.frame along with the "id" numbers.

Can someone please suggest a way to return the data.frame in requested order?

1
Is this homework? It looks suspiciously like the homework tasks in Peng's "Data Analysis" course on Coursera. - IRTFM
@DWin ha! you're right. i knew this was familiar. i helped a co-worker w/ it yesterday. - Anthony Damico
@DWin Yes. but I am stuck on something and thus just wanted to know the way to get the data.frame dynamically filled with variable. so asked. I thought the question and answer would be helpful to other R users as well. Also the due date for homework is already passed so I am not using it to get grades. - Pranav Pandya
To form strings built from numeric sequences that are padded with zeros, use sprintf("%03d", 1:332). This could have been found with a search of SO. I think it was answered yet again just this last week. - IRTFM
Yes @DWin . I think you're right we do not need 3 scenarios. we can do that with sprintf easily. thanks for your suggestion. - Pranav Pandya

1 Answers

0
votes

The simplest way to build up a list of all the nobs values goes something like this:

complete <- function(directory, id = 1:332) {
  # Create an empty vector outside the for loop
  nobs_vector <- c()
  for(i in 1:332)
  {
    if(i<10)
    {
      path<-paste(directory,"/00",id[i],".csv",sep="")
    }
    if(i>9 & i<100)
    {
      path<-paste(directory,"/0",id[i],".csv",sep="") 
    }
    if(i>99 & i<333)
    {
      path<-paste(directory,"/",id[i],".csv",sep="") 
    }  

    mydata<-read.csv(path)
    #nobs<-nrow(na.omit(mydata))
    nobs<-sum(complete.cases(mydata))
    # Add the value to the end of the vector
    nobs_vector <- c(nobs_vector, nobs)
  }
  # Take a look at the final vector you end up with
  print(nobs_vector)
}

It's not necessarily that elegant or efficient, but it does get you those values in a form that persists after the for loop is done. If you wanted to build up a dataframe in a similar way, have a look at ?rbind