2
votes

Within a for loop, I am trying to run a function between two columns of data in my data frame, and move to another data set every interation of the loop. I would like to output every output of the for loop into one vector of answers.

I can't get passed the following errors (listed below my code), depending on if I add or remove row.names = NULL to data <- read.csv... part of the following code (line 4 of the for-loop):

** Edited to include directory references, where the error ultimately was:

corr <- function(directory, threshold = 0) {
  source("complete.R")

The above code/ my unseen directory organzation was where my error was

  lookup <- complete("specdata")
  setwd(paste0(getwd(),"/",directory,sep=""))
  files <-list.files(full.names="TRUE") #read file names
  len <- length(files)   
  answer2 <- vector("numeric") 
  answer <- vector("numeric")
  dataN <- data.frame()
      for (i in 1:len) {
          if (lookup[i,"nobs"] > threshold){
               # TRUE -> read that file, remove the NA data and add to the overall data frame
               data <- read.csv(file = files[i], header = TRUE, sep = ",")
               #remove incomplete
               dataN <- data[complete.cases(data),]
               #If yes, compute the correlation and assign its results to an intermediate vector.

        answer<-cor(dataN[,"sulfate"],dataN[,"nitrate"])
        answer2 <- c(answer2,answer)
      }
    }

setwd("../") return(answer2) }

1) Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed

vs.)

2) Error in [.data.frame(data, , 2:3) : undefined columns selected

What I've tried

  1. referring to the column names directly "colA"
  2. initializing data and dataN to empty data.frames before the for loop
  3. initializing answer2 to an empty vector
  4. Getting an better understanding on how vectors, matrices and data.frames work with each other

** Thank you!**

2
Your code is not very effective and 'R-ish', but I think it should work. Did you make sure that you're reading a proper .csv file?Marat Talipov
Thank you Marat. Yes, it is a proper csv file. (comma delimited text) What are some efficiency changes that you would recommend?Kara_F

2 Answers

1
votes

My problem was that I had the function .R file that I was referencing in the code above, in the same directory as the data files I was looping through and analyzing. My "files" vector was an incorrect length, because it was reading the another .R function I made and referenced earlier in the function. I believe this R file is what created the 'undefined columns'

I apologize, I ended up not even putting up the right area of code where the problem lay.

Key Takeaway: You can always move between directories within a function! In fact, it may be very necessary if you want to perform a function on all the contents of a directory of interest

0
votes

One approach:

# get the list of file names
files <- list.files(path='~',pattern='*.csv',full.names = TRUE)

# load all files
list.data <- lapply(files,read.csv, header = TRUE, sep = ",", row.names = NULL)

# remove rows with NAs
complete.data <- lapply(list.data,function(d) d[complete.cases(d),])

# compute correlation of the 2nd and 3rd columns in every data set
answer <- sapply(complete.data,function(d) cor(d[,2],d[,3]))

The same idea, buth slightly different realization

cr <- function(fname) {
    d <- read.csv(fname, header = TRUE, sep = ",", row.names = NULL)
    dc <- d[complete.cases(d),]
    cor(dc[,2],dc[,3])
}
answer2 <- sapply(files,cr)

example of CSV files:

# ==> a.csv <==
#     a,b,c,d
# 1,2,3,4
# 11,12,13,14
# 11,NA,13,14
# 11,12,13,14
# 
# ==> b.csv <==
#     A,B,C,D
# 101,102,103,104
# 101,102,103,104
# 11,12,13,14