0
votes

Since ifelse() strips attributes, I use multiple steps to replace ifelse() when it comes to date.

For example,

df <- data.table(a = (1:4),
                 b = as.Date(c("2012-05-05","2014-05-02","2016-01-02","2011-01-02")),
                 c = as.Date(c("2014-02-05","2010-01-02","2015-02-02","2012-03-02")))
year <- 2013
df[, d := as.Date(paste0(format(c, "%Y"), "-", format(b, "%m-%d")))]
df[d <= c, d := as.Dat‌​e(paste0(year, "-", format(b, "%m-%d")))]

The example above is quite simple. In real life, I have a more complex situation, which requires making comparison using 3 date columns, totaling 5 different scenarios. Does it mean I need to use 5 steps to complete all the "if else" scenarios? If this is the case, I guess, the advantages of data.table are not best utilized.

Is there some way to avoid using multiple steps?


The codes up there is not quite confusion. Sorry for that. The purpose is that, if the combination of month and day of column b is earlier than that of column c, create a date with the year of 2013, month and day from column b; otherwise, create a date with the year of column c, month and day from column b.

Thanks to @docendo discimus, I changed the code.


One more example

year1<-2020
year2<-2025
year3<-2030
df<-data.table(a=(1:4),b=as.Date(c("2012-05-05","2014-01-02","2016-10-02","2011-01-02")),
               c=as.Date(c("2014-09-05","2010-07-02","2015-02-02","2012-03-02")),
               d=as.Date(c("2008-02-06","2009-08-07","2011-04-04","2010-07-10")))
df[,e:=as.Date(paste0(format(c,"%Y"),"-",format(b,"%m-%d")))];
df[e<=c & e>d,e:=as.Date(paste0(year1,"-",format(b,"%m-%d")))]
df[as.Date(paste0(format(c,"%Y"),"-",format(b,"%m-%d")))<=c & as.Date(paste0(format(d,"%Y"),"-",format(b,"%m-%d")))<=d,e:=as.Date(paste0(year2,"-",format(b,"%m-%d")))]
df[as.Date(paste0(format(c,"%Y"),"-",format(b,"%m-%d")))>c,e:=as.Date(paste0(year3,"-",format(b,"%m-%d")))]

The purpose of the above example is make comparison using 3 date columns. When I say compare, I mean use month and day only, regardless year.

If b<=c and b>d, change the year to 2020,
if b<=c and b<=d, change the year to 2025,
if b>c, change the year to 2030.

I need to use 4 steps to accomplish this. Step 3 and Step 4 becomes ugly since I changed the year of column e in step 2, I cannot use column e to compare with c and d anymore. Is there some way to simplify the above example?

1
Can you explain in words what your code is doing? - David Arenburg
What the **** are you doing there with as.Date and format? It makes your example unreadable. Anyway, please provide a representative, minimal reproducible example. For you example here (i.e., a simple if/else situation), such a two-step approach is exactly what I use. - Roland
sprintf would make this a whole lot more readable as.Date(sprintf('%s-%s-%s', format(c,"%Y"), format(b,"%m"),format(b,"%d"))) - rawr
fyi, you could also rewrite it as df[,d:=as.Date(paste0(format(c,"%Y"),"-",format(b,"%m-%d")))];df[d<=c,d:=as.Date(paste0(year,"-",format(b,"%m-%d")))] - talat
Please provide an example that reflects "the real question". - Roland

1 Answers

1
votes

The memisc package provides the cases function, which often serves in place of multiple calls to ifelse:

d <- data.frame(x = 1:8)
d$y <- cases(
    d$x == 5 -> "Five",
    d$x < 3  -> "Less than three",
    d$x > 5  -> "More than five",
    rep(TRUE, 8) -> "Otherwise"
)
d

Which yields:

  x               y
1 1 Less than three
2 2 Less than three
3 3       Otherwise
4 4       Otherwise
5 5            Five
6 6  More than five
7 7  More than five
8 8  More than five

This is just a toy example to show off cases, but you may find the function useful in your situation. Note that you can replace the conditions d$x == 5 and the like with any series of logical vectors as long as each is the same length, and cases will just catch the first one that evaluates to TRUE.