1
votes

Ive seen the multiple answers to a similar question where people have the error of duplicate 'row.names' are not allowed when importing one csv file into R, but I haven't seen a question for when you're trying to import multiple csv files into one data frame. So essentially, I'm trying to importing 104 files from the same directory and I get the duplicate 'row.names' are not allowed. I woud be able to solve the problem if i was only importing one file as the code is extremely simple, but when it comes to muliple files I struggle. I've tried a number of different ways of importing the data properly, here are a couple of them:

setwd("path")
loaddata <- function(file ="directory") { 
files <- dir("directory", pattern = '\\.csv', full.names = TRUE)
tables <- lapply(files, read.csv)
dplyr::bind_rows
}
data <- loaddata("PhaseReports")

Error:

Error in read.table(file = file, header = header, sep = sep, quote = quote, duplicate 'row.names' are not allowed

Another attempt:

path <- "path"
files <- list.files(path=path, pattern="*.csv")
for(file in files)
{
perpos <- which(strsplit(file, "")[[1]]==".")
assign(
gsub(" ","",substr(file, 1, perpos-1)), 
read.csv(paste(path,file,sep="")))
} 

Error:

Error in read.table(file = file, header = header, sep = sep, quote = quote, duplicate 'row.names' are not allowed

EDIT: For the second method, when I try read.csv(paste(path,file,sep=""), row.names=NULL)) it changes the title of my first column to row.names and shifts the data one column to the right. I tried putting

colnames(rec) <- c(colnames(rec)[-1],"x") rec$x <- NULL

under the last line and I get this error:

Error in `colnames<-`(`*tmp*`, value = "x") : 
attempt to set 'colnames' on an object with less than two dimensions

If there is a much easier way to import multiple csv files into R and I'm over complicating things don't be afraid to let me know.

I know this is a combination of two questions which have been answered plenty of times on stack, I didn't see if anyone had asked this specific question. Thanks in advance!

EDIT 2:

All of the individual files contain data like this:

Half,Play,Type,Time
1,1,Start,00:00:0`
1,2,,0:23:5
1,3,pass,00:03:76
2,4,start,00:04:76
2,5,pass,00:06:92
2,6,end,00:08:00 
1
Forgot to say that I've already tried that. See edits.useR
The error seems to come from the read.csv which means it doesn't matter than you are trying to read multiple files. One of those files has bad row names. Do you think your input files have row names? R assumes that when your header row as one fewer value than your data rows. Without some sample data to make a reproducible example it's near impossible to offer specific help. Also, calling dplyr::bind_rows without any parameters is pretty weird.MrFlick
I will make some example data up and add it to the question. Thanks!useR
I have edited the question accordinglyuseR
I would guess that one of your files is malformed. Try reading subsets of the complete file list till you find the culprit.MrFlick

1 Answers

1
votes

Although this may not solve your problem, you could try to skip the headers while you are reading the files and put it afterwards. So something like (in some of your approaches):

read.csv("Your files/file/paste", header = F, skip = 1)

This will skip the header and hopefully will help with the duplicate row names. The full code to do it could be:

my_files <- dir("Your path/folder etc", pattern = '\\.csv', full.names = TRUE)
result <- do.call(rbind, lapply(my_files, read.csv, header = F, skip = 1))
names(result) <- c("Half","Play","Type","Time")

You can put the header later (the names(result) line does that). If you still have problems I would suggest creating a loop like this:

for (i in my_files){
        print(i)
        read.csv(i)

}

And then see what is the last file name printed before you get an error. This one should be the one you should investigate. You could look whether a row has more than 3 commas because I think that this will be the problem. Hope it helps!