I want to check a data.table for all dates that have a given category, and set all those dates to TRUE
in a separate data.table. I have a data.table that looks something like this:
library(data.table)
DT1 = fread(
'Date Category
2010-01-01 A
2010-01-01 B
2010-01-02 A
2010-01-02 C
2010-01-02 D
2010-01-04 B
2010-01-04 "B OR D"')
DT1[, Date := as.IDate(Date) ]
and I'm trying to extract the values to a data frame that looks like this:
DT2 = fread('
Date A B C D
2010-01-01 FALSE FALSE FALSE FALSE
2010-01-02 FALSE FALSE FALSE FALSE
2010-01-03 FALSE FALSE FALSE FALSE
2010-01-04 FALSE FALSE FALSE FALSE
2010-01-05 FALSE FALSE FALSE FALSE')
DT2[, Date := as.IDate(Date) ]
If a particular category exists for a given date, I would like to set the cell with the row's date and the category's column to true. I know this will probably involve something like:
DT2 <- DT2[Date %in% DT1$Date, A := grep(something)]
I want this call to modify DT2 to look like this:
Date A B C D
2010-01-01 TRUE FALSE FALSE FALSE
2010-01-02 TRUE FALSE FALSE FALSE
2010-01-03 FALSE FALSE FALSE FALSE
2010-01-04 FALSE FALSE FALSE FALSE
2010-01-05 FALSE FALSE FALSE FALSE
Right now, my code uses any(grep())
, which is a problem because it changes DT2$A[4]
to TRUE
, which I don't want. I would like to use a regular expression, since some of my category cells refer to multiple values. Because of the last cell, both my checks on the value B and the value D should set their dummy variables in 2010-01-04
to true. As the last 2 rows show, there may be one date which contains multiple references to a single category.
Is there a way to do this in data.table? I can use other packages if absolutely necessary.
Note that I can't just use dcast
to transform D1 into the proper data.table, because it doesn't contain all the dates I need.
max(Date)
tomax(Date) + 1L
. – Frank