0
votes

I have data with multiple observations per row that I would like to tidy.

The data is the Relative Spectral Response (RSR) for a collection of satellite sensors. I made a toy dataset that is representative of the actual data here. Each sensor/band has two columns, one with the range of wavelengths on which it was tested ("Wvln(nm)"), and one with the response of that sensor ("RSR"). In my toy data band 1 has a strong response to 500 nm, and band 2 has a strong response to 600 nm.

library(tidyverse)

rsr_toy <- tibble::tribble(
  ~`Band 1`, ~`...2`, ~`Band 2`, ~`...4`,
  "Wvln(nm)", "RSR", "Wvln(nm)", "RSR",
  "500", "0.9", "500", "0.01",
  "600", "0.12", "600", "0.8"
)

# remove the first row containing metadata 
rsr1 <- rsr_toy %>% 
  slice(-1) %>% 
  janitor::clean_names("small_camel") %>%
rsr1
# # A tibble: 2 x 4
#   band1 x2    band2 x4   
#   <chr> <chr> <chr> <chr>
# 1 500   0.9   500   0.01 
# 2 600   0.12  600   0.8  

I want to tidy the data, so each observation gets its own row, like so:

# desired outcome:
tibble::tribble(
  ~sensor, ~wavelength, ~rsr,
  "band1", 500, 0.9,
  "band1", 600, 0.12,
  "band2", 500, 0.01,
  "band2", 600, 0.8
)
# # A tibble: 4 x 3
#   sensor wavelength   rsr
#   <chr>       <dbl> <dbl>
# 1 band1         500  0.9 
# 2 band1         600  0.12
# 3 band2         500  0.01
# 4 band2         600  0.8 

How to simply accomplish this using tidyr?

2
If the names were consistently in the pattern of band1 x1 band2 x2 you could pivot it like rsr1 %>% pivot_longer(everything(), names_to=c(".value","sensor"), names_pattern="(.+)(\\d+)") - is there any chance a little cleaning can be done on the names ahead of time?thelatemail

2 Answers

1
votes

Get the data in long format, rename the columns and create sensor column.

library(tidyverse)

rsr1 %>%
  pivot_longer(cols = everything(), 
               names_to = '.value', 
               names_pattern = '(.*?)\\d') %>%
  rename(wavelength = band, rsr = x) %>%
  mutate(sensor = rep(str_subset(names(rsr1), 'band'), length.out = n()),
         .before = 1)

#  sensor wavelength rsr  
#  <chr>  <chr>      <chr>
#1 band1  500        0.9  
#2 band2  500        0.01 
#3 band1  600        0.12 
#4 band2  600        0.8  
0
votes

I split the tbl into one sensor per list, and then did some basic dplyr commands on each sensor, then rejoined the lists.

  1. Split by band so each band gets it's own table (following this solution).
splits <- seq(2,ncol(rsr1),2) %>% 
  map(~ select(rsr1, (.-1):all_of(.)))
splits
# [[1]]
# # A tibble: 2 x 2
#   band1 x2   
#   <chr> <chr>
# 1 500   0.9  
# 2 600   0.12 
# 
# [[2]]
# # A tibble: 2 x 2
#   band2 x4   
#   <chr> <chr>
# 1 500   0.01 
# 2 600   0.8  
  1. Then apply a custom function to each tibble in the list:
my_rename_tbl <- function(tbl){
  tbl %>% 
    # add a column with the band name
    add_column(sensor = colnames(tbl)[1], .before = 1) %>% 
    # rename the other two columns "wavelength" and "rsr" respectively
    rename("wvln" = 2, "rsr" = 3)
}

splits %>% 
  map(my_rename_tbl) %>% 
  bind_rows()
# # A tibble: 4 x 3
#   sensor wvln  rsr  
#   <chr>  <chr> <chr>
# 1 band1  500   0.9  
# 2 band1  600   0.12 
# 3 band2  500   0.01 
# 4 band2  600   0.8