12
votes

I use readr to read in data which consists a date column in time format. I can read it in correctly using the col_types option of readr.

library(dplyr)
library(readr)

sample <- "time,id
2015-03-05 02:28:11,1674
2015-03-03 13:10:59,36749
2015-03-05 07:55:48,NA
2015-03-05 06:13:19,NA
"

mydf <- read_csv(sample, col_types="Ti")
mydf
                 time    id
1 2015-03-05 02:28:11  1674
2 2015-03-03 13:10:59 36749
3 2015-03-05 07:55:48    NA
4 2015-03-05 06:13:19    NA

This is nice. However, if I want to manipulate this column with dplyr, the time column loses its format.

mydf %>% mutate(time = ifelse(is.na(id), NA, time))
        time    id
1 1425522491  1674
2 1425388259 36749
3         NA    NA
4         NA    NA

Why is this happening?

I know I can work around this problem by transforming it to character before, but it would be more convenient without transforming back and forth.

mydf %>% mutate(time = as.character(time)) %>% 
    mutate(time = ifelse(is.na(id), NA, time))
2

2 Answers

22
votes

It's actually ifelse() that is causing this issue, not dplyr::mutate(). An example of the problem of attribute stripping is shown in help(ifelse) -

## ifelse() strips attributes
## This is important when working with Dates and factors
x <- seq(as.Date("2000-02-29"), as.Date("2004-10-04"), by = "1 month")
## has many "yyyy-mm-29", but a few "yyyy-03-01" in the non-leap years
y <- ifelse(as.POSIXlt(x)$mday == 29, x, NA)
head(y) # not what you expected ... ==> need restore the class attribute:
class(y) <- class(x)

So there you have it. It's a bit of extra work if you want to use ifelse(). Here are two possible methods that will get you to your desired result without ifelse(). The first is really simple and uses is.na<-.

## mark 'time' as NA if 'id' is NA
is.na(mydf$time) <- is.na(mydf$id)

## resulting in
mydf
#                  time    id
# 1 2015-03-05 02:28:11  1674
# 2 2015-03-03 13:10:59 36749
# 3                <NA>    NA
# 4                <NA>    NA

If you don't wish to choose that route, and want to continue with the dplyr method, you can use replace() instead of ifelse().

mydf %>% mutate(time = replace(time, is.na(id), NA))
#                  time    id
# 1 2015-03-05 02:28:11  1674
# 2 2015-03-03 13:10:59 36749
# 3                <NA>    NA
# 4                <NA>    NA

Data:

mydf <- structure(list(time = structure(c(1425551291, 1425417059, 1425570948, 
1425564799), class = c("POSIXct", "POSIXt"), tzone = ""), id = c(1674L, 
36749L, NA, NA)), .Names = c("time", "id"), class = "data.frame", row.names = c(NA, 
-4L))
4
votes

There is another version of if_else by @hadley in dplyr. It correctly manage time variables. Look at this github issue as well.