0
votes

I am trying to read 1500 csv files ,but I am getting the below error.

Error in read.table(file = file, header = header, sep = sep, quote = quote, : duplicate 'row.names' are not allowed

Code :

fi<-list.files("C:/Users/Desktop/DL/odi_csv_male",full.names=T)
dat<-lapply(fi,read.csv)

But When Individually open and save the file ,I am able to read the files.But as there are 1500 files I need to do it manually .Any help would be much appreciated ?

The file contains                                                                           version 1.3.0                           
    info    team    Ireland                     
    info    team    England                     
    info    gender  male                        
    info    season  2006                        
    info    date    6/13/2006                       
    info    venue   Civil Service Cricket Club, Stormont                        
    info    city    Belfast                     
    info    toss_winner England                     
    info    toss_decision   bat                     
    info    player_of_match ME Trescothick                      
    info    umpire  R Dill                      
    info    umpire  DB Hair                     
    info    match_referee   CH Lloyd                        
    info    winner  England                     
    info    winner_runs 38                      
    ball    1   0.1 England ME Trescothick  EC Joyce    DT Johnston 0   0
    ball    1   0.2 England ME Trescothick  EC Joyce    DT Johnston 0   0
    ball    1   0.3 England ME Trescothick  EC Joyce    DT Johnston 0   4
1
One of your files is probably malformatted. lapply(fi, function(f){print(f);read.csv(f)}) will print each file name out as it reads it in. Last file printed in the problem fileRichard Telford
How to resolve this?Praveen Chougale
Try lapply(fi, read.csv, row.names = NULL).Rui Barradas
It gives error Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : cannot open file '1000887.csv': No such file or directoryPraveen Chougale
if I open each file and save it csv then I am able to read it.Its time consumingPraveen Chougale

1 Answers

1
votes

fread from data.table is more robust IMO.

Try

library(data.table)
dat<-lapply(fi,fread)

It might also happen that some of your files are not in .csv format. Try adding:

fi<-fi[grepl(".csv",fi)]

Or, as commented, the option row.names=NULL could help in dat<-lapply(fi,function(x) read.csv(x, row.names=NULL))

If the data is unstructured, try with fill=T

dat<-lapply(fi,function(x) fread(x, fill=T))

EDIT: Please note that it is normal (and advised) for dat to be a list in this case, because dat consists of many data.frames. Try indexing your list appropriately using [[]]. In case you really don't want lists, you could use:

for(i in 1:length(fi)) {
 name <- paste0("dat",i)
 myvar <- data.frame(fread(fi[i], fill=T))
 assign(name,myvar, .GlobalEnv)
}

After that you'd have many data frames called dat1, dat2...

EDIT: after chat, the issue was related to the plotting and aggregation of the files, not the reading and the problem is solved