1
votes

I'm trying to group the variables by their max values and then count the number of groups that have their max values in a certain period.

The data set looks like this:

Year Car Value
1991 A 21
1992 A 19
1993 A 20
1992 B 42
1993 B 17
1991 C 31
1992 C 50
1993 C 23

What I want to do is to find the max values per car and then count how many cars reached their maximum values per year.

So essentially, a table like this

Year Count
1991 1
1992 2
1993 0

I was able to identify the max values per group using dplyr but cant figure out how to implement the count. Can someone please help? I've also tried top_n but that just gives me the max value per month which isnt what I want!

3
In your sample data set all cars have their max values in 1992 but the expected result does not reflect this. Please, can you clarify - thank you.Uwe
Another question: What is the expected answer in case a car hits the maximum value in in more than one year, e.g., Car A has a Value of 21 in 1991 and 1992?Uwe
@Uwe : Great question! In that case I would want both the years to be counted.Shruti Mishra

3 Answers

2
votes

Very close to Ben's code! Without n_distinct

df1 <- df %>% 
  group_by(Car) %>%  
  mutate(mx = max(Value)) %>% 
  ungroup() %>% 
  group_by(Year) %>% 
  summarise(count=sum(Value >= mx))

enter image description here

1
votes

The example output does not match the example data, but this might be what you are looking for.

First, group_by Car and determine the maximum value for that particular Car. Then, you can group_by Year, and count up using n_distinct the number of unique Cars, where the Value is equal to the maximum value.

library(dplyr)

df %>%
  group_by(Car) %>%
  mutate(Max_Value = max(Value)) %>%
  group_by(Year) %>%
  summarise(Count = n_distinct(Car[Value == Max_Value]))

Output

   Year Count
  <int> <int>
1  1991     0
2  1992     3
3  1993     0
1
votes

Another dplyr way :

Keep only the max Value rows in every Year and count Year.

library(dplyr)

df %>%
  mutate(Year = factor(Year)) %>%
  group_by(Car) %>%
  slice(which.max(Value)) %>%
  ungroup %>%
  count(Year, .drop = FALSE)

#  Year      n
#* <fct> <int>
#1 1991      0
#2 1992      3
#3 1993      0