I have a dataset of portfolio holdings:
# Input test data
portolios <- structure(list(portfolioid = c(1L, 1L, 1L, 1L, 1L, 1L), secid = c("A", "B", "A", "C", "C", "A"), reportdate = c("2010-03-31", "2010-03-31", "2010-06-30", "2010-06-30", "2010-07-15", "2010-08-31"), report_type = c("Full", "Full", "Full", "Full", "Partial", "Full"), shares = c(100L, 100L, 130L, 50L, 75L, 80L)), .Names = c("portfolioid", "secid", "reportdate", "report_type", "shares"), row.names = c(NA, -6L), class = c("data.table", "data.frame"))
portfolioid secid reportdate report_type shares
1: 1 A 2010-03-31 Full 100
2: 1 B 2010-03-31 Full 100
3: 1 A 2010-06-30 Full 130
4: 1 C 2010-06-30 Full 50
5: 1 C 2010-07-15 Partial 75
6: 1 A 2010-08-31 Full 80
I need to impute the following missing records:
7: 1 B 2010-06-30 Full 0
8: 1 C 2010-08-31 Full 0
The business issue is that sales of positions (shares = 0) are sometimes not reported for Full report_type, so the missing SecID have to be imputed based on the prior report.
Ultimately I am seeking to calculate the change in shares for each SecID from the prior report for each portfolioID such that my dataset looks like this:
changes <- structure(list(portfolioid = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), secid = c("A", "B", "A", "B", "C", "C", "A", "C"), reportdate = c("2010-03-31", "2010-03-31", "2010-06-30", "2010-06-30", "2010-06-30", "2010-07-15", "2010-08-31", "2010-08-31"), report_type = c("Full", "Full", "Full", "Full", "Full", "Partial", "Full", "Full"), shares = c(100L, 100L, 130L, 0L, 50L, 75L, 80L, 0L), change = c(100L, 100L, 30L, -100L, 50L, 25L, -50L, -75L)), .Names = c("portfolioid", "secid", "reportdate", "report_type", "shares", "change"), row.names = c(NA, -8L), class = c("data.table", "data.frame"))
portfolioid secid reportdate report_type shares change
1: 1 A 2010-03-31 Full 100 100
2: 1 B 2010-03-31 Full 100 100
3: 1 A 2010-06-30 Full 130 30
4: 1 B 2010-06-30 Full 0 -100
5: 1 C 2010-06-30 Full 50 50
6: 1 C 2010-07-15 Partial 75 25
7: 1 A 2010-08-31 Full 80 -50
8: 1 C 2010-08-31 Full 0 -75
I am stuck with how to create the i for the outer join portfolios[i]. My issue is that I don't want to use i <- CJ(reportdate, secid)
because it will produce too many records that are unnecessary since not every secid exists at every ReportDate and not correctly represent the data that needs filling.
I think I need a rolling cross-join between reportdate,reportdate[-1,secid]
I want to roll forward the secid and set shares := 0 when secid is missing in a Full report but it existed in prior report (either Partial or Full). I believe I would do this with option roll=1 but I am not sure where or how to implement.
I think my problem is similar to
How to Calculate a rolling statistic in R using data.table on unevenly spaced data
I am sure I am missing something basic in my understanding or a trick with CJ() that can create the requisite i
portolios
to? To your older report? Where is it? You didn't provide it. Or you want to roll join to these specific two observations? It is very unclear (at least to me) what's going on. It would be easier if you'd just provide the data sets you have and then your desired output from the join – David Arenburgshares := 0
but how do you know thesecid
and the date missing? Why1 B 2010-08-31 Full 0
is not missing too, for example? – David Arenburgsecid
existed in a priorreportdate
but is missing in the followingFull reportdate
then thissecid
should be carried forward andshares:=0
. The trick is how to calculate whichsecid
should be carried forward from the priorreportdate
. This is why i believeroll=1
may be part of the solution. – superquant1 B 2010-08-31 Full 0
is not missing is because B is not in the prior report of2010-06-30
in the input dataset. You could also look at it as only wanting to carry forward to the next periodsecid
whereshares <> 0
. You do not want to carry forward filled missing values forshares = 0
– superquant