0
votes

I have imported data in R from an Excel sheet with package readxl.

The sheet contains a column with dates. These dates behave like dates in Excel (I can change the date formatting in Excel).

Directly after importing in R with readxl the format is this:

# A tibble: 1 x 1
  `datum`         
  <dttm>             
1 2010-01-20 21:00:00

My goal is to use the lubridate function days_in_month on the imported dates.

lubridate::days_in_month(df[2,1])

Although using this function gives this error:

Error in as.POSIXlt.default(x, tz = tz(x)) : 
  do not know how to convert 'x' to class “POSIXlt”

I did serveral test to identify the format:

is.Date(df[2,1])
is.POSIXt(df[2,1])
is.instant(df[2,1])

All give result FALSE.

If I print one date I receive this result:

# A tibble: 1 x 1
  `datum`         
  <dttm>             
1 2010-01-20 21:00:00

I have tried several conversions:

df$datum <- as.Date(df$datum, origin = "1899-12-30")
df$datum <- as.Date(as.POSIXct(df$datum, 'GMT'))
df$datum <- as.Date(df$datum, format='%Y-%m-%d')

Although the results of the tests above after conversion are all FALSE.

If I do the first conversion as.Date(df$datum, origin = "1899-12-30"). After this the outcome of print is:

# A tibble: 1 x 1
  `datum`
  <date>    
1 2010-01-20


df$datum + 60 gives:
1 2010-03-21

So it seems it is behaving as a date since I can add 60.

Although all the test give FALSE and days_in_month from lubridate still gives the error above.

How can I convert the date into a correct format which lubridate can process?

Thanks a lot!

1
Unless I'm missing something, "2010-01-20 21:00:00" is not a date, it's a date-time. If you want it to be a Date, I suggest you either (1) convert to time then date, with as.Date(as.POSIXct(...)); or (2) truncate the string and convert, something like as.Date(gsub(" .*", "", df$datum)) or as.Date(substr(df$datum, 1, 10)).r2evans
@ r2evans Thanks, I will try this, although I did several conversions with as.Date as I mentioned above.user2165379
More thoughts (as I caffeinate): your sample of df$datum above is already POSIXt, so as.Date just works. And it seems like the problem you're having is with code and data that we do not have. Can you provide the output from dput(head(df$datum)) and the code you're using that is generating the error? I see no calls to lubridate:: in your code.r2evans
@ r2evans. Thanks! I have added the code in the question above. The output from dput(head(df$datum)) is : structure(list(datum = structure(14629, class = "Date")), row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))user2165379
as.Date(substr(df$datum, 1, 10)) gives : Error in charToDate(x) : character string is not in a standard unambiguous formatuser2165379

1 Answers

0
votes

You are being bitten by the differences in [ between data.frame and tbl_df. Reading your file (present in the comments), I ultimately see:

df <- readxl::read_excel("example dates.xlsx")
df
# # A tibble: 3 x 2
#   datum               datum2             
#   <dttm>              <dttm>             
# 1 2010-01-01 13:25:00 2010-12-22 23:53:40
# 2 2010-01-23 13:30:00 2011-01-07 23:09:10
# 3 2010-02-16 21:45:00 2011-03-19 01:00:52

# for everybody else
df <- structure(list(datum = structure(c(1262352300, 1264253400, 1266356700), class = c("POSIXct", "POSIXt"), tzone = "UTC"), datum2 = structure(c(1293062020.704, 1294441750.08, 1300496452.128), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))

Can we agree that it does not make sense to try to convert a whole frame at once?

as.Date(df)
# Error in as.Date.default(df) : 
#   do not know how to convert 'df' to class "Date"

Dumb question. Well, let's see what happens with other variants.

df$datum
# [1] "2010-01-01 13:25:00 UTC" "2010-01-23 13:30:00 UTC" "2010-02-16 21:45:00 UTC"
as.Date(df$datum)
# [1] "2010-01-01" "2010-01-23" "2010-02-16"
df[2,1]
# # A tibble: 1 x 1
#   datum              
#   <dttm>             
# 1 2010-01-23 13:30:00
as.Date(df[2,1])
# Error in as.Date.default(df[2, 1]) : 
#   do not know how to convert 'df[2, 1]' to class "Date"

With a simple data.frame, [2,1] will return a scalar, not a frame, so that makes sense in base R:

as.data.frame(df)[2,1]
# [1] "2010-01-23 13:30:00 UTC"
as.Date(as.data.frame(df)[2,1])
# [1] "2010-01-23"

So the problem is that tibble is forcing you to be explicit in that you want to drop from a frame to a scalar/vector.

This is normally a good thing, frankly. When dealing with a "normal" (non-tibble) frame and you want to look at a group of columns, as.data.frame(df[,1:2]), R always returns a data.frame. Unfortunately, if you define the columns programmatically and it returns a single column, then [ by default reduces it from a frame to a vector: as.data.frame(df)[,1]. You can prevent this auto-coercion with drop=, ala as.data.frame(df[,1,drop=FALSE]). Many (including myself) consider this to be a mistake: df[,cols] should be relied on to always return the same type of object, regardless if it is 20 columns or just 1 column. (I recognize that there are reasons why it does this, and I'm not berating the original R developers.)

So the problem causing your error is that tibble is requiring you to be explicit when subsetting your tbl_df into a single cell. If you want to work on a single cell, use df$datum[2] or df[2,1][[1]] to force it. If you want to work on a whole column, then df$datum . And all of those work directly with as.Date, since it knows how to deal with vectors of POSIXt (natively) and numeric/integer (along with origin=). Unfortunately, df[,1] of a tibble will not return a vector, so as.Date does not know what to do with it.

Bottom line:

as.Date(df$datum[2])
# [1] "2010-01-23"
as.Date(df[2,1][[1]])
# [1] "2010-01-23"
as.Date(df$datum)
# [1] "2010-01-01" "2010-01-23" "2010-02-16"