1
votes

I have a dataset of non-numeric sequences structured similarily to below (there are many more rows/col in the real dataset):

  X1 X2 X3 X4 X5 X6 X7
1  A  A  C  C  B  A  A
2  A  A NA NA NA  B  A
3  A  C  C NA NA  B  B

Each observation (ie. rows 1, 2, 3) are independent of each other. I would like to fill in the NA's with a combination of the values from the nearest non-NA's (from the same row). This would result in assigning, for example, a transition value 'A-B' for NA's between A and B (the values on either side of the NA). My ideal result would be:

  X1 X2  X3  X4  X5 X6 X7
1  A  A   C   C   B  A  A
2  A  A A-B A-B A-B  B  A
3  A  C   C C-B C-B  B  B

I am new to R and therefore unsure what approach to take. I have searched for solutions and have found some helpful ones eg. using na.locf to fill NA's with the last observation, but I can't figure out how to fill the data considering non-na values on the left and right sides in the same row. Any suggestions would be appreciated.

1
What should be the result for c(NA, NA,"B", "A", NA, "A", NA, NA)?RiskyMaor
I would have to review the data to see if this type of sequence exists. For "A", NA, "A" I would like the result to be "A", "A-A", "A". If there is an NA at the beginning or at the end of the sequence I may have to fill it with a value that indicates beginning or end respectively then fill in the rest of the NA's eg. "A", "A-end", "end". Great comment though! Thank-you.el88

1 Answers

2
votes

That's a complicated way, I bet, there are better options. Using your data

> df
# A tibble: 3 x 7
  X1    X2    X3    X4    X5    X6    X7   
  <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 A     A     C     C     B     A     A    
2 A     A     NA    NA    NA    B     A    
3 A     C     C     NA    NA    B     B  

and some dplyr, tidyr and zoo functions:

df %>%
  mutate(id = row_number()) %>%
  pivot_longer(cols = starts_with("X")) %>%
  mutate(value = ifelse(is.na(value), 
                        paste0(na.locf(value), "-", na.locf(value, fromLast=TRUE)), 
                        value)) %>%
  pivot_wider() %>%
  select(-id)

returns

# A tibble: 3 x 7
  X1    X2    X3    X4    X5    X6    X7   
  <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 A     A     C     C     B     A     A    
2 A     A     A-B   A-B   A-B   B     A    
3 A     C     C     C-B   C-B   B     B