My data
I have a data.table DT
with the current (F0YR
) and the next (F1YR
) fiscal year-end (FYE) encoded as integers. Since every next FYE will eventually become
a current FYE, the integer will be both in the column F1YR
and F0YR
. Also, my data contains monthly observations so the same FYE will be in the data set
multiple times:
library(data.table)
DT <- data.table(ID = rep(c("A", "B"), each=9),
MONTH = rep(100L:108L, times=2),
F0YR = rep(c(1L, 4L, 7L), each=3, times=2),
F1YR = rep(c(4L, 7L, 9L), each=3, times=2),
value = c(rep(1:5, each=3), 6, 6, 7),
key = "ID,F0YR")
DT
ID MONTH F0YR F1YR value
[1,] A 100 1 4 1
[2,] A 101 1 4 1
[3,] A 102 1 4 1
[4,] A 103 4 7 2
[5,] A 104 4 7 2
[6,] A 105 4 7 2
[7,] A 106 7 9 3
[8,] A 107 7 9 3
[9,] A 108 7 9 3
[10,] B 100 1 4 4
[11,] B 101 1 4 4
...
What I want to do
For every ID
and F1YR
combination, I want to get the value for the ID
and F0YR
combination. As an example: Company A had a value of 2
for FOYR==4
. Now,
I want an additional column for all combinations with ID=="A"
and F1YR==4
which is set to 2, next to the already existent value of 1.
What I tried
intDT <- DT[CJ(unique(ID), unique(F0YR)), list(ID, F0YR, valueNew = value), mult="last"]
setkey(intDT, ID, F0YR)
setkey(DT, ID, F1YR)
DT <- intDT[DT]
setnames(DT, c("F0YR.1", "F0YR"), c("F0YR", "F1YR"))
DT
ID F1YR valueNew MONTH F0YR value
[1,] A 4 2 100 1 1
[2,] A 4 2 101 1 1
[3,] A 4 2 102 1 1
[4,] A 7 3 103 4 2
[5,] A 7 3 104 4 2
[6,] A 7 3 105 4 2
[7,] A 9 NA 106 7 3
[8,] A 9 NA 107 7 3
[9,] A 9 NA 108 7 3
[10,] B 4 5 100 1 4
[11,] B 4 5 101 1 4
...
(Note that I use mult="last"
here because, although the values should only change with F0YR or F1YR changes, sometimes they don't and this is just my
tie breaker).
What I want
This looks improvable. First of all, I have to make a copy of my DT. Second, since I join basically the same data.table
, all the column names have the same name
and I have to rename them. I thought that a self join
would be the way forward, but I tried and tried and couldn't get a nice solution. I have the hope
that there is something easy out there which I just don't see...Does anyone have a clue? Or is my data set up in such a way that it is actually hard
(maybe because I have monthly observations, but want to join only quarterly or yearly changing values).
:=
by group is working now in 1.8.1, and could be used for this perhaps. R-Forge is building ok and the binary installs ok (in R 2.15.0 since R-Forge only buids for latest) usinginstall.packages("data.table", repos="http://R-Forge.R-project.org")
. Check thattest.data.table()
returns "653 tests completed ok" to be sure it's the latest version. See latest NEWS to see if any new features are useful for this one. – Matt DowleNA
for you. If neither works please file a bug report or new question. Make suretest.data.table()
returns653 tests ok
to rule out somehow using an older revision of v1.8.1. – Matt Dowle:=
by group can help here. Tried it for an hour and couldn't figure anything out. So I guess my solution is OK. Joining one data.table by two columns of that data.table is a very specific use case anyway. – Christoph_J:=
by group, grouping byi
notby
, and using join inherited scope (V1
comes fromi
scope) to boot. Didn't exactly plan to go that far in the first demo of:=
by group, but that's just the way it worked it out! – Matt Dowle