0
votes

I am supposed to load the data for my master-thesis in an R-dataframe, which is stored in 74 excel workbooks. Every workbook has 4 worksheets each, called: animals, features, r_words, verbs. All of the worksheets have the same 12 variables(starttime, word, endtime, ID,... etc.). I want to concatenate every worksheet under the one before, so the resulting dataframe should have 12 columns and the number of rows is depending on how many answers the 74 subjects produced. I want to use the readxl-package of the tidyverse and followed this article: https://readxl.tidyverse.org/articles/articles/readxl-workflows.html#csv-caching-and-iterating-over-sheets. The first problem I face is how to read all 4 worksheets with read_excel(path, sheet = "animals", "features", "r_words", "verbs"). This only works with the first worksheet, so I tried to make a list with all the sheet-names (object sheet). This is also not working. And when I try to use the following code with just one worksheet, the next line throws an error: Error in basename(.) : a character vector argument expected So, here is a part of my code, hopefully fulfilling the requirements:

filenames <- list.files("data", pattern = '\\.xlsm',full.names = TRUE)

# indices
subfile_nos <- 1:length(filenames)

# function to read all the sheets in at once and cache to csv
read_then_csv <- function(sheet, path) {
  for (i in 1:length(filenames)){
    sheet <- excel_sheets(filenames[i])
     len.sheet <- 1:length(sheet)
    path <- read_excel(filenames[i], sheet = sheet[i]) #only reading in the first sheet
  pathbase <- path %>%
    basename() %>% #Error in basename(.) : a character vector argument expected
    tools::file_path_sans_ext()
  path %>%
    read_excel(sheet = sheet) %>% 
    write_csv(paste0(pathbase, "-", sheet, ".csv"))
  }
}
2
It would be easier if you could share a reproducible example. However, I believe that the problem is that you're not pasting the result. Try to replace read_excel(filenames[i], sheet = sheet[i]) by read_excel(paste(filenames[i]), sheet = paste(sheet[i])). I also didn't understand this section pathbase <- path %>% basename() %>% #Error in basename(.) : a character vector argument expected tools::file_path_sans_ext(). Can you elaborate why do you have this?FilipeTeixeira

2 Answers

1
votes

You should do a double loop or a nested map, like so:

library(dplyr)
library(purrr)
library(readxl)

# I suggest looking at 
?purrr::map_df

# Function to read all the sheets in at once and save as csv
read_then_csv <- function(input_filenames, output_file) {
  # Iterate over files and concatenate results
  map_df(input_filenames, function(f){
    # Iterate over sheets and concatenate results
    excel_sheets(f) %>%
      map_df(function(sh){
        read_excel(f, sh)
      })
  }) %>%
  # Write csv
  write_csv(output_file)
}

# Test function
filenames <- list.files("data", pattern = '\\.xlsm',full.names = TRUE)
read_then_csv(filenames, 'my_output.csv')
0
votes

You say...'I want to concatenate every worksheet under the one before'... The script below will combine all sheets from all files. Test it on a COPY of your data, in case it doesn't do what you want/need it to do.

# load names of excel files 
files = list.files(path = "C:\\your_path_here\\", full.names = TRUE, pattern = ".xlsx")

# create function to read multiple sheets per excel file
read_excel_allsheets <- function(filename, tibble = FALSE) {
  sheets <- readxl::excel_sheets(filename)
  sapply(sheets, function(f) as.data.frame(readxl::read_excel(filename, sheet = f)), 
         simplify = FALSE)
}

# execute function for all excel files in "files"
all_data <- lapply(files, read_excel_allsheets)