0
votes

I have a tbl_df that I'm trying to create unique columns based on a variety of filters. I did some reading on using ifelse, and other mutate functions but wasn't able to figure this one out on my own. The df is named Test and is listed below.

name   team salary season position
<chr> <int>  <int>  <int> <chr>   
AP        6     15   2017 OF      
AN       11      8   2016 SP      
AS        1      8   2014 SP      
AR        3     11   2018 SS      
AB        2     10   2015 3B      
AC        8      7   2017 RP      
Abe      11     10   2016 OF      
AM        7     12   2014 RP      
Ari      11     48   2018 1B      
BH       13     29   2015 OF           

I'm trying to create a variety of columns based on the results of specific filters. The code I have so far is as follows:

summary <- test %>%
  group_by(team, season)

The mutate columns I'm trying to create are:

  1. Hitting: Sum all values from Salary for rows with a position that does not equal SP or RP, and seasons 2016-2018
  2. Pitching: Sum all values from Salary for rows with a position that equals SP or RP, and seasons 2016-2018
  3. Relievers: Count all rows with a position equal to RP
  4. Over_40: Count all rows with a salary over 40
  5. Over_40_H: Count all rows with a salary over 40 and position not equal to SP or RP

I want all of these columns and results to be grouped by team, and season (as shown above)

1

1 Answers

1
votes

You may have struggled to work this out because it looks like you're trying to summarise your data rather than mutate it. As such, you can use ifelse() within summarise() – rather than mutate() – to do this.

library(tidyverse)

test %>%
  group_by(team, season) %>% 
  summarise(
    players = n(),
    hitting = sum(ifelse(!position %in% c("SP", "RP"), salary, 0)),
    pitching = sum(ifelse(position %in% c("SP", "RP"), salary, 0)),
    relievers = sum(ifelse(position == "RP", 1, 0)),
    over_40 = sum(ifelse(salary > 40, 1, 0)),
    over_40_h = sum(ifelse(salary > 40 & !position %in% c("SP", "RP"), 1, 0))
  ) %>% 
  mutate(
    hitting = ifelse(season < 2016 | season > 2018, NA, hitting),
    pitching = ifelse(season < 2016 | season > 2018, NA, pitching)
  ) %>% 
  arrange(team, season)

Note that:

  • I've added a column showing the number of players because it provides a basic check that the grouping has worked.
  • The mutate() command is included to remove the sum of pitcher and hitter salaries for years that are not between 2016 and 2018, as you specified in the question. However, since you've grouped the data by season anyway it may be equally easy to simply ignore the salaries for years you're not interested in.
  • I've arranged the data by team and season at the end, but this is mainly to make the results more readable.
  • Since you're using dplyr, I've changed the column names to follow the tidyverse style guide.