First, some data:
library(data.table)
# 1. Input table
df_input <- data.table(
x = c("x1", "x1", "x1", "x2", "x2"),
y = c("y1", "y1", "y2", "y1", "y1"),
z = c(1:5))
In each column, I want to keep only the first value in each run of consecutive values. E.g. look at the y
column, which has three different runs: (1) two y1
, (2) one y2
, and (3) a second run of y1
. Within each such run, duplicated values should be replaced with ""
.
# x y z
# 1: x1 y1 1 # 1st value in run of y1: keep
# 2: x1 y1 2 # 2nd value in run: replace
# 3: x1 y2 3 # 1st value in run: keep
# 4: x2 y1 4 # 1st value in 2nd run of y1: keep
# 5: x2 y1 5 # 2nd value: replace
Thus, the desired output table:
df_output <- data.table(
x = c("x1", "", "", "x2", ""),
y = c("y1", "", "y2", "y1", ""),
z = c(1:5))
# x y z
# 1: x1 y1 1
# 2: 2
# 3: y2 3
# 4: x2 y1 4
# 5: 5
How it's possible to get "output" table by using dplyr or data.table packages?
Thanks