I have about 50-60 Excel workbooks that are sitting in a directory. These Excel files are by and large all the same. There are two work sheets, one with instructions, another with the same tidy data across all sheets. Columns/vars A through J are the data I actually want to extract but I'm willing to read it all into a data.frame. The data go out as far as A through N cols.
I'm writing a script to extract all of the raw data and so far so good. Part of my script uses the standard approach of list.files to build a 'df' of file names. Then, I use 'lapply' to read all of the excel files into a list. Now here comes the rub.
I want to use dplyr::bind_rows (but am open to other suggestions) to then bind all of the rows together since they are the same and share the same headers. This works fantastic in my proof of concept. When I use .id argument of dplyr::bind_rows on it, I get 1 through j as the id var, which corresponds to the data.frame position within the list.
files.list <- list.files(pattern='*.xlsx') # list file names in directory
df.list <- lapply(files.list, read_excel) # read excel into a list of dfs
df <- bind_rows(df.list, .id = "id") # bind the rows of the dfs together
Is it possible to get the actual file name as opposed to the position in the list from where the data.frame was read? If so, how would I do that?