0
votes

I have a data from of dates and values. I am trying to get the fourth highest value per year using dplyr and order or multiple aggregate statements. I want the date that the fourth highest value occurred on as well as the value in a data frame for all years.

Here is my script:

    timeozone <- import(i, date="DATES", date.format = "%Y-%m-%d %H", header=TRUE, na.strings="NA")
    colnames(timeozone) <- c("column","date", "O3")
    timeozone %>%
      mutate(month = format(date, "%m"), day = format(date, "%d"), year = format(date, "%Y")) %>%
      group_by(month, day, year) %>%
      summarise(fourth = O3[order(O3, decreasing = TRUE)[4] ])

I am not sure what is wrong with what I've got above. Any help would be appreciated.

Data:

Dates Values

11/12/2000 14

11/13/2000 16

11/14/2000 17

11/15/2000 21

11/13/2001 31

11/14/2001 21

11/15/2001 62

11/16/2001 14

2
Try dplyr::nth().tchakravarty
In your edit, why are you grouping by month and day? I thought you wanted the fourth largest value per year. From the sample data you posted, there is no 03 column (is that value?) and it appears there is only one value per day -- there can be no fourth highest if that is the case. Try grouping by only yearMark Peterson

2 Answers

1
votes

Another option with base (and using the iris data again) would be to split the variable by the group, then order it and extract the fourth element. For example

data(iris)
petals <- split(iris$Petal.Length, iris$Species)
sapply(petals, function(x) x[order(x)][4])

or, actually, even more succinctly with tapply

tapply(iris$Petal.Length, iris$Species, function(x) x[order(x)][4])

Edit

Using the sample data above, you could extract the full row (or just the date, if you wanted), as follows.

date <- c("11/12/00", "11/13/00", "11/14/00", "11/15/00", "11/13/01", 
"11/14/01", "11/15/01", "11/16/01")

value <- c(14, 16, 17, 21, 31, 21, 62, 14)

date_splt <- strsplit(date, "/")
year <- sapply(date_splt, "[", 3)

d <- data.frame(date, value, year)

d_splt <- split(d, d$year)
lapply(d_splt, function(x) x[order(x$value), ][4, ])
1
votes

Since you didn't provide reproducible data, here is an example using iris. You will need to group by your years instead of by Species but the same ideas apply.

You can do it relatively directly with dplyr if you are not wedded to aggregate:

iris %>%
  group_by(Species) %>%
  summarise(fourth = Petal.Length[order(Petal.Length, decreasing = TRUE)[4] ])

gives:

     Species fourth
1     setosa    1.7
2 versicolor    4.9
3  virginica    6.6

You can confirm that the values are correct using:

by(iris$Petal.Length, iris$Species, sort)

Using nth, following the suggestion of @tchakravarty :

iris %>%
  group_by(Species) %>%
  summarise(fourth = nth(sort(Petal.Length), -4L))

Gives the same value as above.