18
votes

I have 20 years worth of weather data, but I'm only interested in the patterns per year. I don't care how June 1995 is different from June 2011, for example. Instead, I want to have 20 values for June 1, 20 values for June 2, etc.

My question: How do I drop the year portion of a date object, keep the month AND day, while also maintaining the sequential properties of dates? My ultimate goal is a long list of repeated mm/dd values corresponding each to the outcome variable. I'll treat the mm/dd like factors, but in the correct order.

# Given this:
as.Date(c("2014-06-01","1993-06-01", "2013-06-03", "1999-01-31"), "%Y-%m-%d")
# I want to get this:
"06-01" "06-01" "06-03" "01-31"
# That will sort like this
"01-31" "06-01" "06-01" "06-03"

Little hacks like using sub() to drop the year and convert the dash to a decimal doesn't work because then the 1st of the month is the same as the 10th of the month. I also tried turning the dates into character strings, removing the year, and then turning it back into a date... that just made everything year 2014.

2
There may be some subtleties. Is the 29th Feb in a leap year before, after, or the same as the 1st March in a non-leap year? You might want to use the yday function from the lubridate package which returns the index of a day in a year. Other functions in lubridate may be helpful too, worth checking out. - Spacedman
Interesting, that might do what I need it to. If I have m/d anyway though it won't matter about leap years because 2/30 will be a distinct level. - Nancy

2 Answers

22
votes

Does this work?

temp<-as.Date(c("2014-06-01","1993-06-01", "2013-06-03", "1999-01-31"), "%Y-%m-%d")

x<-format(temp, format="%m-%d")

 x
[1] "06-01" "06-01" "06-03" "01-31"


sort(x)
[1] "01-31" "06-01" "06-01" "06-03"
1
votes

jalapic's answer just before mine, transforms the date column into a character vector (the object passed in to format is returned as a character for pretty printing).

according to the OP, one reason for getting rid of the year, perhaps the key one, is to roll-up by by day & month, regardless of year. To me, that suggests a time series is not the right data type for this column, instead you are better off with an ordered factor which will preserve the "sequential properties of dates" as OP requires.

this is pretty much the

Granted, a factor does not understand dates or numbers, but it does understand unique values, which in this instance at least, it should behave as the OP wants

> d = "2014-06-01"
> d = as.Date(d)

fnx = function(x) {
         unlist(strsplit(as.character(x), '[19|20][0-9]{2}-', fixed=FALSE))[2]
     }

> dm("2012-01-25")
    [1] "01-25"

> dm1 = sapply(column_of_date_objs, fnx)

> new_col = as.factor(dm1, ordered=TRUE)