5
votes

My Excel document my.xlsx has two Sheets named Sheet1 and Sheet2. I want to read all worksheets in an Excel workbook using fread function from data.table R package. The following code just read the active worksheet. Wonder how to read all worksheets without knowing their names. Thanks

df3 <- data.table::fread("in2csv my.xlsx")
> names(df3)
[1] "A" "B"
> df3
   A  B
1: 1  2
2: 2  4
3: 3  6
4: 4  8
5: 5 10
1
Can't you use readxl or openxlsx (stackoverflow.com/questions/27713310/…)akrun
Yes, I can use readxl, rio or openxlsx. But the size of my actual file is more than 300MB that's why requires the use of data.table::fread. Any thoughts, please.MYaseen208
It does not have this functionality. I'm not sure if the fread algorithm would be helpful if adapted to do it, but you could look into it by following the instructions here github.com/Rdatatable/data.table/wiki/SupportFrank
So, how were these files created? And what genius decided that this was a good file format for the task?Roland
Then you may have also seen this? --write-sheets WRITE_SHEETS The names of the Excel sheets to write to files, or "-" to write allismirsehregal

1 Answers

4
votes

I used openxlsx::read.xlsx the last time I needed to read many sheets from an XLSX.

#install.packages("openxlsx")
library(openxlsx)
#?openxlsx::read.xlsx

#using file chooser:
filename <- file.choose()
#or hard coded file name:
#filename <- "filename.xlsx"

#get all the sheet names from the workbook
SheetNames<-getSheetNames(filename)

# loop through each sheet in the workbook
for (i in SheetNames){

  #Read the i'th sheet
  tmp_sheet<-openxlsx::read.xlsx(filename, i)

  #if the input file exists, append the new data;; else use the first sheet to initialize the input file
  ifelse(exists("input"),
         input<-rbind(input, tmp_sheet),
         input<-tmp_sheet)
}

Note: This assumes each worksheet has identical column structure and data types. You may need to standardize\normalize the data (ex. tmp_sheet <- as.data.frame(sapply(tmp_sheet,as.character), stringsAsFactors=FALSE)), or load each sheet into it's own dataframe and pre-process further before merging.