I am using readxl library to read many excel worksheets in the same excel workbook (called data.xlsx) with the following format:
Data starts in row 3.
row1
row2
companyName 1980 1981 1982 ... 2016
company1 5 6 7 8
company2 10 20 30 40
company3 20 40 60 80
....
The data range is different in length by each row and column. However, they have the companyName as common key. The year range varies from starting from 1980 or 1990 until 2016. The worksheet name is the data name.
I want to create a single excel where all data are extracted from all worksheets.
companyName Year dataname values
company1 1980 sheetname1 5
company1 1981 sheetname1 6
company1 1982 sheetname1 7
company1 ... sheetname1 ...
company1 2016 sheetname1 8
company2 1980 sheetname1 10
company2 1981 sheetname1 20
company2 1982 sheetname1 30
company2 ... sheetname1 ...
company2 2016 sheetname1 40
.... .... ... ...
company1 2000 sheetname2 xxx
company1 2001 sheetname2 yyy
etc
etc
etc
This is how far I managed to get too:
library(tidyverse)
library(readxl)
library(data.table)
#read excel file (from [here][1])
file.list<-"data.xlsx"
**#read all sheets (and **skip** first two rows)**
df.list <- lapply(file.list,function(x) {
sheets <- excel_sheets(x)
dfs <- lapply(sheets, function(y) {
read_excel(x, sheet = y,skip=2)
})
names(dfs) <- sheets
dfs
})
I have following issues:
- the first two rows are not been skipped
- how I create one dataframe with only select sheets only (ie. sheet 5, sheet 10 and sheet 15).
Thank you for your help.
Source: R: reading multiple excel files, extract first sheet names, and create new column
readxlpackage? I have no issues with skipping rows. Unless not all sheets in your file start with the same number of rows before the headers. - hpesoj626skips vary? - hpesoj626