0
votes

I am trying to merge several csv files with the same columns in R. I tried the solution proposed in : Merging of multiple excel files in R and works perfectly. However, I would like to add an index column (row name) to identify which file each row corresponds to. I tried:

files <- list.files(pattern="*.csv")

require(purrr)

mainDF <- files %>% map_dfr(read.csv, row.names=files) 

But I get the error:

Error in read.table(file = file, header = header, sep = sep, quote = quote, : invalid 'row.names' length

I would like to get a column similar to this, or ideally just the numbers e.g. 1, 2 etc

I would like a column like this, or even just the number

Any ideas?

1
When you use quotes around files here... row.names="files", you are giving the string "files", not the object files. Try removing the quotes.cory
Thank you for your comment, I tried this but I still get an error. But I will edit my question accordingly :)Anna
What happens when setting row.names inside read.csv, as this: mainDF <- files %>% map_dfr(read.csv(row.names=T)) ?AlvaroMartinez
I get a different error: Error in read.table(file = file, header = header, sep = sep, quote = quote, : argument "file" is missing, with no defaultAnna
maybe mainDF<- files %>% map_dfr(read.csv(row.names = 1, header= TRUE)), in theory row.names=1will set first .csv column as row names.AlvaroMartinez

1 Answers

1
votes

One way to deal with this is the .id argument of map_dfr(). If the list passed to map_dfr() is named, you can include a column in the output with the name of each list element. If the list is unnamed, the index will be included in the column instead. That way, the rows corresponding to each .csv will be associated with that index.

So you could do the following. Note that the second line is optional. If you omit the naming, you will get the index (1,2,...) instead.

files <- list.files(pattern="*.csv")

names(files) <- paste('file', 1:length(files), sep = '_')

require(purrr)

mainDF <- files %>% map_dfr(read.csv, .id = 'file_ID') 

The resulting data.frame will have a column named file_ID.