read.csv is modifying input file name on its own

Question

I am trying to pass directory as input to a function and use it as input to read.csv to read CSV files. However when during the process the read.csv is modifying the file name string sent at runtime.

Directory:"C:/SAT/Self Courses/R/data/specdata" Inside this directory there are number of CSV files i want to read and act upon with the following functions

complete<-function(directory,id=1:332)
{

  gFull<-c()
  ids<-str_pad(id,3,pad="0")
  idExt<-paste(ids,".csv",sep="")
  dir<-paste(directory,idExt,sep="/")

  for(i in dir)
  {

    tableTemp<- read.csv(i,header=T)
    tableTemp<- na.omit(tableTemp)
    gFull<-c(gFull,nrow(tableTemp))
  }
  output<-data.frame(id,gFull,stringsAsFactors = F)
  return(output)
}  

cor_sub<-function(data,directory)
{
  #print(directory)
  id<-data[1]
  id<-str_pad(id,3,pad="0")
  id<-paste(id,".csv",sep="")
  #print(id)
  dir_temp<-paste(directory,id,sep="/")
  print(dir_temp)
  #read table
  input<-read.csv(dir_temp,header=T)
  input<-na.omit(input)
  #correlation
  return (cor(input$sulfate,input$nitrate))
}


cor<-function(directory,threshold=0)
{
  #find the thresholds of each file
  qorum<-complete(directory,1:12)
  print(threshold)
  qorum$gFull[qorum$gFull<threshold]<-NA
  qorum<-na.omit(qorum)
  v_cor<-apply(qorum,1,cor_sub,directory)
  #(v_cor)

 }

I execute this code with a call

cor("C:/SAT/Self Courses/R/data/specdata",0)

The error output which i get is

> cor("C:/SAT/Self Courses/R/data/specdata",0)
[1] 0
[1] "C:/SAT/Self Courses/R/data/specdata/001.csv"
 Show Traceback

 Rerun with Debug
 Error in file(file, "rt") : cannot open the connection In addition: Warning message:
In file(file, "rt") :
  cannot open file '7.21/001.csv': No such file or directory

The problem is dir_temp : I have "C:/SAT/Self Courses/R/data/specdata/001.csv" however in the nextline read.csv is taking input '7.21/001.csv'

Please bear with me if the question seems trivial, i am still in Novice mode :)

You should definitely try to avoid naming functions that are already base R functions (i.e. cor(), which you seem to be trying to use both ways here). But, I'm a little confused about what you are trying to do.. are you just trying to read every .csv in a directory and calculate the correlation between two fields that are in every file? — devmacrile
Yes that's the goal. An extra check is to check the number of rows/records in each <name>.csv is greater than a threshold value. — satyajeet anand tripathy

devmacrile devmacrile · Accepted Answer · 2015-10-02T17:21:37

See if this works for you (I'm ignoring most of the code that you have tried thus far because it seems unnecessarily complicated and not runnable anyways):

results <- list()
threshold <- 0  # file must have this many lines to be processed
filepath <- "C:/SAT/Self Courses/R/data/specdata"
filenames <- list.files(filepath)  # assumes you want all files in directory
suppressWarnings(
for(filename in filenames) {

    # construct the path for this particular file, and read it
    fpath <- paste(filepath, filename, sep="/")
    input <- read.csv(fpath, header=TRUE)

    # check if threshold is met, skip if not
    if(nrow(input) <= threshold)) next

    input <- na.omit(input)  # do you want this before the threshold check?

    # store our correlation in our results list
    # stats::cor() to avoid confusion with your defined function
    results[[filename]] <- stats::cor(input$sulfate, input$nitrate)
})

print(results)

Let me know if you have any questions about how this works below (I haven't actually run it, tbh). You should be able to take it from here and generalize it to your needs.

read.csv is modifying input file name on its own

1 Answers