1
votes

how do I filter a R data frame by year?

In the reproducible example I am trying to filter dates that are in 2021 (in column b). Thank you!

library(tidyverse)

a <- c(10,20,30)
b <- as.Date(c('31-11-20', '15-11-21', '31-11-22'))
my_df <- data.frame(a,b)

I have tried the following code but none of them successfully filtered by the year 2021.

my_df_new <- my_df %>%
  filter(between(b, as.Date('01-01-21'), as.Date('31-12-21')))

my_df_new <- my_df %>%
  filter(between(b, as.Date('2021-01-01'), as.Date('2021-12-31')))

my_df_new <- my_df[my_df$b > "31-12-20" & my_df$b < "1-01-22", ]
1
None of your dates in my_df are in this millenium. Have you looked at your data and recognized that "31-11-20" is being parsed into "0031-11-20"? You really need to fix that before you think about how to filter it. - r2evans
... but if you use as.Date(..., format="%d-%m-%Y"), your middle code works. - r2evans
Your example dates are funky. I provided a solution below that works with them, but the main issue is that e.g. 31-11-20 is being treated as November 20, 0031 when you want it to be treated as November 31, 2020 presumably. - socialscientist

1 Answers

1
votes

Your example dates require some extra work because (a) they are not real dates (November only has 30, not 31 days) and (b) you don't format them prior to turning them into dats.

library(dplyr)

# Example data
a <- c(10,20,30)
b <- as.Date(c('31-11-20', '15-11-21', '31-11-22'))
my_df <- data.frame(a,b)

# Extracts whatever part of the string you specified as year
# when you converted the variable to a date
my_df %>% 
  mutate(year = format(b, "%Y"))
#>    a          b year
#> 1 10 0031-11-20 0031
#> 2 20 0015-11-21 0015
#> 3 30 0031-11-22 0031

# Notice that year is not 20, 21, 22...it's actually stored as 
# the day because you didn't specify properly when creating your
# date variable. So, we'll extract day and save it as year.
new_df <- my_df %>% 
  mutate(year = format(b, "%d"))  %>%
  print()
#>    a          b year
#> 1 10 0031-11-20   20
#> 2 20 0015-11-21   21
#> 3 30 0031-11-22   22

# Now filter to only 2021
new_df %>%
  filter(year == 21)
#>    a          b year
#> 1 20 0015-11-21   21