Merging multiple csvs in R with index based on the file name

Question

I am trying to merge several csv files with the same columns in R. I tried the solution proposed in : Merging of multiple excel files in R and works perfectly. However, I would like to add an index column (row name) to identify which file each row corresponds to. I tried:

files <- list.files(pattern="*.csv")

require(purrr)

mainDF <- files %>% map_dfr(read.csv, row.names=files)

But I get the error:

Error in read.table(file = file, header = header, sep = sep, quote = quote, : invalid 'row.names' length

I would like to get a column similar to this, or ideally just the numbers e.g. 1, 2 etc

Any ideas?

When you use quotes around files here... row.names="files", you are giving the string "files", not the object files. Try removing the quotes. — cory
Thank you for your comment, I tried this but I still get an error. But I will edit my question accordingly :) — Anna
What happens when setting row.names inside read.csv, as this: mainDF <- files %>% map_dfr(read.csv(row.names=T)) ? — AlvaroMartinez
I get a different error: Error in read.table(file = file, header = header, sep = sep, quote = quote, : argument "file" is missing, with no default — Anna
maybe mainDF<- files %>% map_dfr(read.csv(row.names = 1, header= TRUE)), in theory row.names=1will set first .csv column as row names. — AlvaroMartinez

qdread qdread · Accepted Answer · 2021-02-08T15:08:34

One way to deal with this is the .id argument of map_dfr(). If the list passed to map_dfr() is named, you can include a column in the output with the name of each list element. If the list is unnamed, the index will be included in the column instead. That way, the rows corresponding to each .csv will be associated with that index.

So you could do the following. Note that the second line is optional. If you omit the naming, you will get the index (1,2,...) instead.

files <- list.files(pattern="*.csv")

names(files) <- paste('file', 1:length(files), sep = '_')

require(purrr)

mainDF <- files %>% map_dfr(read.csv, .id = 'file_ID')

The resulting data.frame will have a column named file_ID.

Merging multiple csvs in R with index based on the file name

1 Answers