0
votes

I would like to convert repeating values in a vector into NA's, such that I keep the position of the first occurrence of each new value.

I can find lots of posts on how to solve the removal of duplicate rows, but no posts that solve this issue.

Can you help me convert the column "problem" into the values in the column "desire"?

dplyr solutions are preferred.

library(tidyverse)

df <- tribble(
  ~frame, ~problem, ~desire,
  1,  NA, NA, 
  2, "A", "A",
  3, NA, NA,
  4, "B", "B", 
  5, "B", NA, 
  6, NA, NA, 
  7, "C", "C",
  8, "C", NA, 
  9, NA, NA,
  10, "E", "E")

df
# A tibble: 10 x 3
   frame problem desire
   <dbl> <chr>   <chr> 
 1     1 NA      NA    
 2     2 A       A     
 3     3 NA      NA    
 4     4 B       B     
 5     5 B       NA    
 6     6 NA      NA    
 7     7 C       C     
 8     8 C       NA    
 9     9 NA      NA    
10    10 E       E 

_____EDIT with "Base R"/ "dplyr" solution___
Ronak Shah's solution works. Here it is within a dplyr workflow in case anyone is interested:

df %>% 
  mutate(
    solved = replace(problem, duplicated(problem), NA))

# A tibble: 10 x 4
   frame problem desire solved
   <dbl> <chr>   <chr>  <chr> 
 1     1 NA      NA     NA    
 2     2 A       A      A     
 3     3 NA      NA     NA    
 4     4 B       B      B     
 5     5 B       NA     NA    
 6     6 NA      NA     NA    
 7     7 C       C      C     
 8     8 C       NA     NA    
 9     9 NA      NA     NA    
10    10 E       E      E 
2

2 Answers

1
votes

Using data.table rleid, we can replace the duplicated values to NA.

library(data.table)
df$answer <- replace(df$problem, duplicated(rleid(df$problem)), NA)

#   frame problem desire answer
#   <dbl> <chr>   <chr>  <chr> 
# 1     1 NA      NA     NA    
# 2     2 A       A      A     
# 3     3 NA      NA     NA    
# 4     4 B       B      B     
# 5     5 B       NA     NA    
# 6     6 NA      NA     NA    
# 7     7 C       C      C     
# 8     8 C       NA     NA    
# 9     9 NA      NA     NA    
#10    10 E       E      E     

For a complete base R option we can use rle instead of rleid to create sequence

df$answer <- replace(df$problem, duplicated(with(rle(df$problem), 
                     rep(seq_along(values), lengths))), NA)

As in the example shown if all the similar values are always together we can use only duplicated

df$problem <- replace(df$problem, duplicated(df$problem), NA)
0
votes

We can use data.table

library(data.table)
setDT(df)[duplicated(rleid(problem)), problem := NA][]