0
votes

I am trying to import multiple excel workbook files, each of which is comprised of 6 sheets, and concatenate them into a single R data.frame. I have multiple workbook files (wb1.xlsx), each of which is made up of 6 sheets (1, 2, 3, 4, 5 and 6). Each sheet has the same name in each workbook (e.g. in wb1.xlsx and wb2.xlsx the first sheet in both files is called sheet 1). In addition each sheet has the same set of headers headers in every other sheet and in every other workbook (e.g. Sheet 1 in wb1 and sheet 2 in wb2 have the same headers).

I have tried using the import_list function from the rio package:

library(rio)

DF <- import_list(dir(pattern = ".xlsx"), rbind = TRUE)

However, this only imports the first sheet of each excel workbook. I want to import every sheet from the excel workbook files.

In addition I would like to create a column in the data frame which contains the name of the excel workbook file from which elements were imported (e.g. a column for all elements imported from wb1.xlsx called wb1.xlsx).

1
does my answer address your problem?Vitali Avagyan

1 Answers

0
votes

Here is a solution where you need to specify only number of workbooks and number of sheets per wb.

library(XLConnect)
library(purrr)

number_of_sheets <- 6 
number_of_wbs <- 100
DF <- map_dfc(1:number_of_wbs, function(x) {
  wb <- loadWorkbook(paste0("wb",x,".xlsx"))
  tables <- map_df(1:number_of_sheets,function(y) readWorksheet(wb,
                                          sheet = paste0("Sheet",y)))
  colnames(tables) <- paste0("wb",x,".xlsx")
  return(tables)
} 
)

In my mock example I had 2 wbs:

First workbook's Sheet1 contains:

header1
0.706206019
0.929864738
0.820188853
0.822880906
0.986846159
0.201870111
0.295824333
0.682271283
0.918636814
0.308713375
0.620041348
0.396961295
0.592918544
0.813379311
0.679918546
0.149930842
0.344765206
0.454569802
0.439798642
0.79660039

Second workbook's Sheet1 contains:

header1
0.359145063
0.97889235
0.561966441
0.560005392
0.37039449
0.827651329
0.428239882
0.684086633
0.376406956
0.67482297
0.088475246
0.843315296
0.478772685
0.431786377
0.213058341
0.570395357
0.418874778
0.270617596
0.0852233
0.894443579

And my final output is a data frame like this:

> DF
     wb1.xlsx    wb2.xlsx
1  0.74550551 0.324916411
2  0.76288674 0.503682245
3  0.98223947 0.426830680
4  0.90170329 0.813966804
5  0.56008503 0.680455280
6  0.45745945 0.001716535
7  0.14974342 0.089871648
8  0.23033300 0.362305799
9  0.57763394 0.426019894
10 0.04268935 0.055268297
11 0.26225496 0.903003228
12 0.58454780 0.836464347
13 0.43491035 0.393890841
14 0.18667029 0.147720392
15 0.81738619 0.653991709
16 0.45968982 0.209427777
17 0.13967637 0.274640492
18 0.25768938 0.522425832
19 0.66174385 0.722621747
20 0.49010581 0.689067647