3
votes

I have daily weather data with columns for the day of the month, the month, the year, and the data. But I need to add another column for the day of the year. e.g 1 - 365 (or 366 for leap years).

I'm not much of a programmer at all, I am familiar with seq() e.g. seq(1, 365) But the above would terminate at 365. I need to sequentially increase the number while accounting for the year, so that the sequence starts over every year (and accounts for leap years). In this example, all weather data begin on Jan. 1st. Any ideas/suggestion/pointers much appreciated.

Edit: Example data

example.data <- structure(list(V1 = 1:6, V2 = c(1L, 1L, 1L, 1L, 1L, 1L), 
    V3 = c(1950L, 1950L, 1950L, 1950L, 1950L, 1950L), 
    V4 = c(NA, NA, NA, NA, NA, NA), 
    V5 = c(0, 0, 0, 0, 0, 0)),
    .Names = c("V1", "V2", "V3", "V4", "V5"), row.names = c(NA, 6L), class =                "data.frame")`
3
To get your data in a format that would be useful for those answering the question, try running dput(head(dat)), where dat is the name of your data frame.Aaron left Stack Overflow

3 Answers

4
votes

R has a Date class, which is a good first step; you can get that by pasting your columns into "Y-M-D" format and then calling as.Date. But there's an even better option, which is the POSIXlt class, which contains exactly the information you want in the yday field, as well as lots of other potential useful information. So then I convert the Date to POSIXlt format, and get the day of the year; since this starts with zero I then add 1.

dat <- data.frame(d=1:6,
                  m=rep(c(1,2,12), 2),
                  y=rep(c(1950, 1951), each=3))
dat$Date <- as.Date(with(dat, paste(y, m, d, sep="-")))
dat$doy <- as.POSIXlt(dat$Date)$yday + 1
dat
##   d  m    y       Date doy
## 1 1  1 1950 1950-01-01   1
## 2 2  2 1950 1950-02-02  33
## 3 3 12 1950 1950-12-03 337
## 4 4  1 1951 1951-01-04   4
## 5 5  2 1951 1951-02-05  36
## 6 6 12 1951 1951-12-06 340

The advantage of this is that it works correctly even if the order of your rows is changed or a particular day is missing. It's almost never a good idea to have your analysis depend on the order of the data.

4
votes

Try this code, assuming your "year" column is named "V3":

enter image description here

Edit: More seriously, pasting a picture of your data is a bad idea, see here for how to include your data to make it easier for people to help. Including dput(head(data)) is almost always best.

For your problem, read in your data:

z <- read.csv("test.data.txt", sep="\t", header = FALSE)

Then use dplyr to seq_along() each year:

library(dplyr)
mydat <- z %>% group_by(V3) %>%
               mutate(day = seq_along(V3))

We can verify we got some 366s:

sum(mydat$day == 366)
sum(mydat$day == 365)
4
votes

Assuming your dataset is named df, you could construct a date field:

df$date <- as.Date(paste(df$Y, df$m, df$d, sep="-"), "%Y-%m-%d")

And then use the get the %j attribute from that date object:

df$day_of_year <- as.numeric(strftime(df$date, "%j"))