2
votes

I have in my data this column :

table(data$year)
2011 2012 2013 2014 2015 2016 2017 2018 2019 
   2   28   17   36   26   29   37   33   10

is.numeric(data$year)
[1] TRUE

I want to mutate with case_when with the following code :

data <- data %>%
  mutate(periode_2a = case_when(
    year >= 2011 && year <= 2013  ~ "2011-2013",
    year >= 2014 && year <= 2015 ~ "2014-2015",
    year >= 2016 && year <= 2017 ~ "2013-2017",
    TRUE ~ "2018-2019"
  ))

Which i think is obvious : i want to make category of years.

I obtain that :

table(data$periode_2a)

2011-2013 
      218 

I have try some other style :

> data <- data %>%
+   mutate(periode_2a = case_when(
+     year == 2011:2013 ~ "2011-2013",
+     year == 2014:2015 ~ "2014-2015",
+     year == 2016:2017 ~ "2013-2017",
+     TRUE ~ "2018-2019"
+   ))

or

> data <- data %>%
+   mutate(periode_2a = case_when(
+     year == "2011"|"2012"|"2013" ~ "2011-2013",
+     year == "2014"|"2015" ~ "2014-2015",
+     year == "2016"|"2017" ~ "2013-2017",
+     TRUE ~ "2018-2019"
+   ))

without success ...

What did i wrong ??

Thanks to all

2
In the first code block, remove the && and replace with & and in second, use %in% instead of ==akrun
Thanks, works perfectly !Aytan

2 Answers

1
votes

We can use %in% for a vector of length greater than 1

library(dplyr)
data %>%
  mutate(periode_2a = case_when(
    year  %in% 2011:2013 ~ "2011-2013",
    year %in% 2014:2015 ~ "2014-2015",
    year %in% 2016:2017 ~ "2013-2017",
    TRUE ~ "2018-2019"
   ))

The == will be useful for elementwise operations i.e. if the length of both the objects are the same length across the operator (or if the rhs is of length 1 - it gets recycled). When there is more than one element, the recycling with elementwise comparison will reset once it reaches the length of the vector. Regarding the use of &&, it returns a single TRUE/FALSE output

1
votes

Instead of using multiple conditions in case_when you can use cut with labels.

Since you did not provide an example I will use mpg column of default mtcars dataset.

mtcars$mpg
#[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3
#[14] 15.2 10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3
#[27] 26.0 30.4 15.8 19.7 15.0 21.4

You can define the intervals in which you want to divide the data in breaks and construct labels from it.

breaks <- c(0, 15, 20, 25, 50)
labels <- paste(head(breaks, -1), tail(breaks, -1), sep = "-")
cut(mtcars$mpg, breaks, labels)

#[1] 20-25 20-25 20-25 20-25 15-20 15-20 0-15  20-25 20-25 15-20 15-20
#[12] 15-20 15-20 15-20 0-15  0-15  0-15  25-50 25-50 25-50 20-25 15-20
#[23] 15-20 0-15  15-20 25-50 25-50 25-50 15-20 15-20 0-15  20-25
#Levels: 0-15 15-20 20-25 25-50

This will be helpful when you have large number of years in your data and writing conditions for each one of them can be tedious.