0
votes

I'm working on the Coursera Data Science course. I evaluated my code along the way to the correct answer, and for some reason by 2nd to last step doesn't produce the output I expect, even though my final answer is correct.

Here's my final code. It's a function that reads a directory full of csv files and counts the complete values in each file and prints that along with the associated id number.

complete<-function(dir, id=1:332){
    comp_cases<-numeric()
    files<-list.files(dir, full.names=TRUE){
        for (i in id){
            data<-read.csv(filelist[i])
            vals<-sum(complete.cases(data)
            comp_cases<-c(comp_cases,v)
           }


  data.frame(id, comp_cases)
}

However, when I omit data.frame(id, comp_cases) and just call comp_cases, I get the number of complete cases in the first file, instead of the vector of the number of completes in all of the files. Why isn't my loop functioning without the data.frame(id, comp_cases), which is outside of the loop itself? What exactly is data.frames() doing here? Im using R 3.42 on Windows.

1
I'dhighlyrecommendusingspacesinyourcode.Itwillmakeitmucheasiertoread. - Gregor Thomas

1 Answers

1
votes

The value returned when a function is called in R is the value of the statement. you can explicitly return a value using return(something), of course, but when you don't need to explicitly cause the function to be returned, many R programmers will omit the call to return() and simply end the function with a function call (in this case data.frame(...) that returns a value, or if the value was already returned, the programmer may just evaluate the value to be returned as in:

my_fun <- function(x){
    out <- NULL # initialize the return value.

    ... do things ... 

    out # implicitly return this value
}

As an aside, everything in R has a return value. for and while loops return NULL, and an assignments (e.g. x = 3) return the value assigned. R programmers may do weird things with this last bit such as:

complete<-function(dir, id=2:332){
    comp_cases<-numeric()
    files<-list.files(dir, full.names=TRUE){
        for (i in id){
            data<-read.csv(filelist[i])
            vals<-sum(complete.cases(data)
            comp_cases<-c(comp_cases,v)
           }


  answer <- data.frame(id, comp_cases)
}

which are technically valid code, but weird from a code-as-documentation perspective