I have factor variable that occurs in two columns and and now I want first lag, no matter what column factor last appeared in.
Consider following data.table.
require(data.table)
set.seed(21)
dt <- data.table(item1 = c(rep(sample(letters[1:5]), 2), sample(letters[6:10])),
item2 = c(rep(sample(letters[6:10]), 2), sample(letters[1:5])),
value1 = rnorm(15, 5, 2),
value2 = rnorm(15, 5, 2),
iteration = rep(1:3, each = 5))
> dt
item1 item2 value1 value2 iteration
1: d i 0.4464375 6.491179 1
2: b j 6.5148245 5.665638 1
3: c f 3.9031889 2.751919 1
4: a g 5.3450990 3.587738 1
5: e h 6.1257061 3.544912 1
6: d i 8.0236359 1.331371 2
7: b j 6.3180503 4.184624 2
8: c f 7.2440561 5.053722 2
9: a g 3.4307173 6.823257 2
10: e h 4.1486154 8.268693 2
11: j a 5.7859952 5.121371 3
12: f c 5.0735143 8.695145 3
13: i e 2.9358327 5.160250 3
14: g d 2.4702771 7.837112 3
15: h b 4.5460694 7.917232 3
I have tried to solve this with data.table package.
dt[, lag1 := c(NA, value1), by = item1]
dt[, lag2 := c(NA, value2), by = item2]
dt
item1 item2 value1 value2 iteration lag1 lag2
1: d i 0.4464375 6.491179 1 NA NA
2: b j 6.5148245 5.665638 1 NA NA
3: c f 3.9031889 2.751919 1 NA NA
4: a g 5.3450990 3.587738 1 NA NA
5: e h 6.1257061 3.544912 1 NA NA
6: d i 8.0236359 1.331371 2 0.4464375 6.491179
7: b j 6.3180503 4.184624 2 6.5148245 5.665638
8: c f 7.2440561 5.053722 2 3.9031889 2.751919
9: a g 3.4307173 6.823257 2 5.3450990 3.587738
10: e h 4.1486154 8.268693 2 6.1257061 3.544912
11: j a 5.7859952 5.121371 3 NA NA
12: f c 5.0735143 8.695145 3 NA NA
13: i e 2.9358327 5.160250 3 NA NA
14: g d 2.4702771 7.837112 3 NA NA
15: h b 4.5460694 7.917232 3 NA NA
I could probably solve this by creating one column for item and one for value, but is there a better solution?
And just to be clear, my expected value on lag1 on row 11 is 4.184624.
I will also need lag for item2 and did this dplyr.
dt %>%
mutate(nr = 1:nrow(dt)) %>%
gather(key, value, -nr, -iteration) %>%
mutate(key = ifelse(key == "item1" | key == "item2", "item", "value"),
variabel = rep(c(1, 2), 2, each = nrow(dt))) %>%
spread(key, value) %>%
group_by(item) %>%
arrange(nr) %>%
mutate(lag = lag(value)) %>%
gather(key, value, -iteration, -nr, -variabel) %>%
unite(key, c("key", "variabel"), sep = "") %>%
spread(key, value)
iteration nr item1 item2 lag1 lag2 value1 value2
* <int> <int> <chr> <chr> <chr> <chr> <chr> <chr>
1 1 1 e f <NA> <NA> 4.48327811883486 5.98823833422944
2 1 2 b i <NA> <NA> 6.21252978898878 3.6803830789734
3 1 3 d g <NA> <NA> 5.62689643314086 7.00228385274896
4 1 4 c h <NA> <NA> 5.10720616395708 7.14416894881173
5 1 5 a j <NA> <NA> 7.25650757535391 6.51153141154262
6 2 6 e f 4.48327811883486 5.98823833422944 3.88373308164829 2.08907058913021
7 2 7 b i 6.21252978898878 3.6803830789734 8.07191789162847 6.88574195362948
8 2 8 d g 5.62689643314086 7.00228385274896 4.87510729533042 1.25944984673148
9 2 9 c h 5.10720616395708 7.14416894881173 5.0431504307243 4.4934555124612
10 2 10 a j 7.25650757535391 6.51153141154262 0.820345123625779 4.41487625686153
11 3 11 g d 1.25944984673148 4.87510729533042 3.37822264689098 5.43753611910662
12 3 12 j a 4.41487625686153 0.820345123625779 -0.88757977661203 2.28986114731552
13 3 13 i e 6.88574195362948 3.88373308164829 4.96240860503556 4.75454561215201
14 3 14 h b 4.4934555124612 8.07191789162847 4.29063975464589 4.09626986248512
15 3 15 f c 2.08907058913021 5.0431504307243 5.07114037497055 5.19449624162733