0
votes

I would like to compile an Excel file with multiple tabs labeled by year (2016, 2015, 2014, etc). Each tab has identical data, but column names may be spelled differently from year-to-year.

I would like to standardize columns in each sheet before combining.

This is the generic way of combining using purrr and readxl for such tasks:

combined.df <- excel_sheets(my.file) %>% 
  set_names() %>%                                 
  map_dfr(read_excel, path = my.file, .id = "sheet") 

...however as noted, this creates separate columns for "COLUMN ONE", and "Column One", which have the same data.

Inserting make.names into the pipeline would probably be the best solution.

Keeping it all together would be ideal...something like:

   combined.df <- excel_sheets(my.file) %>% 
    set_names() %>% 
    map(read_excel, path = my.file) %>% 
    map(~(names(.) %>%  #<---WRONG
            make.names() %>% 
            str_to_upper() %>% 
            str_trim() %>% 
            set_names()) ) 

..but the syntax is all wrong.

2
You could extract the column names of the first sheet and then set them for all remaining sheets by the col_names argument of read_excelkath
@kath: each sheet's column names may be spelled differently. Also, some sheets may include additional fields.Nettle

2 Answers

1
votes

Rather than defining your own function, the clean_names function from the janitor package may be able to help you. It takes a dataframe/tibble as an input and returns a dataframe/tibble with clean names as an output.

Here's an example:

library(tidyverse)

tibble(" a col name" = 1,
       "another-col-NAME" = 2,
       "yet another name  " = 3) %>% 
    janitor::clean_names()
#> # A tibble: 1 x 3
#>   a_col_name another_col_name yet_another_name
#>        <dbl>            <dbl>            <dbl>
#> 1          1                2                3

You can then plop it right into the code you gave:

combined.df <- excel_sheets(my.file) %>% 
    set_names() %>%
    map(read_excel, path = my.file) %>%  #<Import as list, not dfr
    map(janitor::clean_names) %>%        #<janitor::clean_names
    bind_rows(.id = "sheet")
0
votes

Creating a new function is doable but is verbose and uses two maps:

  # User defined function: col_rename
  col_rename <- function(df){
    names(df) <- names(df) %>% 
     str_to_upper() %>% 
     make.names() %>% 
     str_trim()
   return(df)
  }

   combined.df <- excel_sheets(my.file) %>% 
    set_names() %>%
    map(read_excel, path = my.file) %>%  #<Import as list, not dfr
    map(col_rename) %>%                  #<Fix colnames (user defined function)
    bind_rows(.id = "sheet")