combine/merge rows with dplyr

Question

The data that looks like

Month    Location    Money
1          Miami      12
1          Cal        15
2          Miami       5
2          Cal         3
...
12         Miami       6
12         Cal          8

I want to transform it so it looks like

Month     Location      Money
Spring      Miami        sum(from month=1,2,3)
spring      Cal          sum (from month= 1,2,3)
summer...
summer...
fall...
fall...
winter...
winter...

I dont' know how to ask the question directly (merging rows, aggregating rows?) but googling it only returns dplyr::group_by and summarize which collapses the rows based on a single value of the row. I want to collpase/summarise the data based on multiple row values. Is there an easy way? Any help would be appreciated Thanks!

Can you share all months data and the months you want to assign to summer, fall, winter etc? — Karthik S
spring is when month= 1,2,3 / summer when month = 4,5,6...etc. it seems like the question I posted is hard to understand. What I am aiming to do is tidy my data that has three columns - 1. month (1, 2, ... 11, 12, 1, 2, ... 11, 12) and 2. location (Miami, Cal, Miami, Cal... repeated 12times each) and 3. Money (any number) — needhelp
Can you post your entire data using dput() function? It will easy to recreate your dataframe at our end. — Karthik S
I want to summarize the data by month and Location, but not the way dplyr::summarize does, which pairs each value in month to each value in location, creating 24 rows. Instead, I want to pair each location with seasons (listed above), creating a total of 8 rows. — needhelp
Can you share some example (or simulated) data so we can see what you are working with and what you are trying to do. — Brian Fisher

Brian Fisher Brian Fisher · Accepted Answer · 2020-12-08T17:47:32

It sounds like you want to

assign season to each record,
group_by season,
summarize.

If this is where you are going, you can either create a new column, Or you can do it directly. You could also create a separate table with month and season and left_join to your data.

library(dplyr)
## simulate data
df = tibble(
      month = rep(1:12, each = 4),
      location = rep(c("Cal", "Miami"), times = 24),
      money = as.integer(runif(48, 10, 100 ))
)

head(df)
# # A tibble: 6 x 3
# month location money
# <int> <chr>    <int>
# 1     1 Cal         69
# 2     1 Miami       84
# 3     1 Cal         38
# 4     1 Miami       44
# 5     2 Cal         33
# 6     2 Miami       64

## Create season based on month in groups of 3
df %>%
      mutate(season = (month-1) %/% 3 +1) %>%
      group_by(season, location) %>%
      summarize(Monthly_Total = sum(money))
# # A tibble: 8 x 3
# # Groups:   season [4]
# season location Monthly_Total
# <dbl> <chr>            <int>
# 1      1 Cal                360
# 2      1 Miami              265
# 3      2 Cal                392
# 4      2 Miami              380
# 5      3 Cal                348
# 6      3 Miami              278
# 7      4 Cal                358
# 8      4 Miami              411

Using the same data you can skip the column creation and include it in group_by:


df %>%
      group_by(season = (month-1) %/% 3 +1, location) %>%
      summarize(Monthly_Total = sum(money))
## results identical to above.

It may make more sense to just create a season table:

seasons = tibble(
      month = 1:12,
      season = rep(c("Spring", "Summer", "Winter", "Fall"), each = 3)
)

df %>%
      left_join(seasons) %>%
      group_by(season, location) %>%
      summarize(Monthly_Total = sum(money))
## again identical to above

The latter has the advantage of being more transparent.

combine/merge rows with dplyr

2 Answers