2
votes

I have multiple excel files and they have unique sheet names (date of file creation in my case). I read them in bulk and need to assign the sheet name to each file in new column "id". I know how to make numeric id, or id = file name, but cannot find a way to get sheet name as id.

library(readxl)
library(data.table)

file.list <- list.files("C:/Users/.../Studies/",pattern='*.xlsx')
df.list <- lapply(file.list, read_excel)

#id = numeric
df <- rbindlist(df.list, idcol = "id")

#Or by file name:
attr(df.list, "names") <- file.list
df2 = rbindlist(df.list,idcol="id")

#How to get sheet names?
1
attr(df.list, "names") <- sapply(file.list, excel_sheets)? Should work if each file has only one sheet. - Abdou
Thanks. Unfortunately, number of sheets in each file is greater than one. Typically 2, but could be more - Alexey Ferapontov
So, are you working with only the first sheet from each file? If not, then df.list does not contains the right data. If so, you can do: attr(df.list, "names") <- sapply(file.list, function(x) excel_sheets(x)[1]). - Abdou
I'll need both sheets in my studies - separate df's for that. This your solution worked for my applications, thank you. Furnish an answer to upvote? - Alexey Ferapontov
Please see answer. - Abdou

1 Answers

3
votes

If you happen to be working with only the first sheets of your files, then the following should help you grab the first sheets' names as the id for your dataframes:

attr(df.list, "names") <- sapply(file.list, function(x) excel_sheets(x)[1])

However, if you are considering importing the data from all the available sheets you will need to do a bit more work, starting with how you create your list of dataframes:

df.list <- lapply(file.list,function(x) {
  sheets <- excel_sheets(x)
  dfs <- lapply(sheets, function(y) {
    read_excel(x, sheet = y)
  })
  names(dfs) <- sheets
  dfs
})

This should create a list of lists, which should contain all the available data in your files. The lists inside the main list are appropriately named after the sheet names. So, you will not need to change any attributes afterwards. But to bind the dataframes together, you need to do:

rbindlist(lapply(df.list, rbindlist, id = "id"))

I hope this proves useful.