How can I assign the value from one of two POSIXct columns in a data.frame to a new POSIXct column?

Question

I have a data.frame with two columns of type POSIXct, though for every row, only one column will have a value, e.g.,

dd <- data.frame(date1 = c(now(), NA), date2 = c(as.POSIXct(NA), now()))
> dd
                date1               date2
1 2016-05-06 11:30:04                <NA>
2                <NA> 2016-05-06 11:30:04

I would now like to create a third column that will contain the value of whichever column has a non-NA value, i.e., the result should look like

> dd
                date1               date2               date3
1 2016-05-06 11:26:36                <NA> 2016-05-06 11:26:36
2                <NA> 2016-05-06 11:26:36 2016-05-06 11:26:36

I've tried using ifelse(), but it doesn't work:

> mutate(dd, date3 = ifelse(!is.na(date1), date1, date2))
                date1               date2      date3
1 2016-05-06 11:30:04                <NA> 1462559405
2                <NA> 2016-05-06 11:30:04 1462559405

Neither does logical vector-based assignment:

> dd[!is.na(dd$date1), "date3"] <- dd[!is.na(dd$date1), "date1"]
> dd[!is.na(dd$date2), "date3"] <- dd[!is.na(dd$date2), "date2"]
> dd
                date1               date2      date3
1 2016-05-06 11:30:04                <NA> 1462559405
2                <NA> 2016-05-06 11:30:04 1462559405

Can anyone explain this behavior?

Am I stuck with creating a new data.frame with an empty column of class POSIXct and then assigning into it? This would not be ideal because it breaks the rule of being able to just assign into a data.frame and having it magically work.

Or should I do the assignment and then change the column class afterwards (as suggested in this solution)? This would not be ideal because the conversion to numeric in the course of the assignment drops the timezone, which I would then have to supply again when calling as.POSIXct().

Thanks in advance!

POSIXct really is just a number. Convert back to date-form with as.POSIXct like so: dd$date3 <- as.POSIXct(ifelse(is.na(dd$date1), dd$date2, dd$date1), origin = origin). Also nice: dd[!is.na(dd)]...but that's column-wise, so t(dd)[!is.na(t(dd))], maybe. — alistaire
Thanks! But, could you explain a little bit about or point me to something on why this happens? My naive understanding is that POSIXct is a class that is distinct and different from the numeric class. Why does the coercion to numeric happen if I assign into a data.frame? — matmat
ifelse strips attributes, including classes; see ?ifelse, which has an example very much like yours. The [] option is more complicated, but is because you're assigning to part (not the whole) of a column that doesn't exist, so coercion takes place to fill the column. There's some info at ?`[.data.frame`, but not much. If you assign something with the appropriate class to the whole column first (e.g. dd$date3 <- as.POSIXct(NA)) it will work fine. — alistaire
OK, that seems to put the pieces together enough. I tried searching through the R documention, but couldn't find anything. Thank you! — matmat

Hanjo Odendaal Hanjo Odendaal · Accepted Answer · 2016-05-07T09:11:38

The following solution worked for me, although its not very clean code:

dd<-read.csv("dd.csv",stringsAsFactors = F,na.strings = c("", " "))

dd[,1]<-as.POSIXct(dd[,1],"%m/%d/%Y %H:%M",tz = "GMT")
dd[,2]<-as.POSIXct(dd[,2],"%m/%d/%Y %H:%M",tz = "GMT")
dd[,'Date3']<-dd[,1]


dd[which(!is.na(dd$Date1)),'Date3']<-dd$Date1[!is.na(dd$Date1)]
dd[which(!is.na(dd$Date2)),'Date3']<-dd$Date2[!is.na(dd$Date2)]

str(dd)
'data.frame':   6 obs. of  3 variables:
 $ Date1: POSIXct, format: "2016-05-20 11:30:00" ...
 $ Date2: POSIXct, format: NA ...
 $ Date3: POSIXct, format: "2016-05-20 11:30:00" .

sum(is.na(dd$Date3))
[1] 0

The trick I used was to create Date3 using Date1, which in turn means that column's class is POSIXct

How can I assign the value from one of two POSIXct columns in a data.frame to a new POSIXct column?

2 Answers