1
votes

The data that looks like

Month    Location    Money
1          Miami      12
1          Cal        15
2          Miami       5
2          Cal         3
...
12         Miami       6
12         Cal          8

I want to transform it so it looks like

Month     Location      Money
Spring      Miami        sum(from month=1,2,3)
spring      Cal          sum (from month= 1,2,3)
summer...
summer...
fall...
fall...
winter...
winter...

I dont' know how to ask the question directly (merging rows, aggregating rows?) but googling it only returns dplyr::group_by and summarize which collapses the rows based on a single value of the row. I want to collpase/summarise the data based on multiple row values. Is there an easy way? Any help would be appreciated Thanks!

2
Can you share all months data and the months you want to assign to summer, fall, winter etc? - Karthik S
spring is when month= 1,2,3 / summer when month = 4,5,6...etc. it seems like the question I posted is hard to understand. What I am aiming to do is tidy my data that has three columns - 1. month (1, 2, ... 11, 12, 1, 2, ... 11, 12) and 2. location (Miami, Cal, Miami, Cal... repeated 12times each) and 3. Money (any number) - needhelp
Can you post your entire data using dput() function? It will easy to recreate your dataframe at our end. - Karthik S
I want to summarize the data by month and Location, but not the way dplyr::summarize does, which pairs each value in month to each value in location, creating 24 rows. Instead, I want to pair each location with seasons (listed above), creating a total of 8 rows. - needhelp
Can you share some example (or simulated) data so we can see what you are working with and what you are trying to do. - Brian Fisher

2 Answers

1
votes

It sounds like you want to

  1. assign season to each record,
  2. group_by season,
  3. summarize.

If this is where you are going, you can either create a new column, Or you can do it directly. You could also create a separate table with month and season and left_join to your data.

library(dplyr)
## simulate data
df = tibble(
      month = rep(1:12, each = 4),
      location = rep(c("Cal", "Miami"), times = 24),
      money = as.integer(runif(48, 10, 100 ))
)

head(df)
# # A tibble: 6 x 3
# month location money
# <int> <chr>    <int>
# 1     1 Cal         69
# 2     1 Miami       84
# 3     1 Cal         38
# 4     1 Miami       44
# 5     2 Cal         33
# 6     2 Miami       64

## Create season based on month in groups of 3
df %>%
      mutate(season = (month-1) %/% 3 +1) %>%
      group_by(season, location) %>%
      summarize(Monthly_Total = sum(money))
# # A tibble: 8 x 3
# # Groups:   season [4]
# season location Monthly_Total
# <dbl> <chr>            <int>
# 1      1 Cal                360
# 2      1 Miami              265
# 3      2 Cal                392
# 4      2 Miami              380
# 5      3 Cal                348
# 6      3 Miami              278
# 7      4 Cal                358
# 8      4 Miami              411

Using the same data you can skip the column creation and include it in group_by:


df %>%
      group_by(season = (month-1) %/% 3 +1, location) %>%
      summarize(Monthly_Total = sum(money))
## results identical to above.

It may make more sense to just create a season table:

seasons = tibble(
      month = 1:12,
      season = rep(c("Spring", "Summer", "Winter", "Fall"), each = 3)
)

df %>%
      left_join(seasons) %>%
      group_by(season, location) %>%
      summarize(Monthly_Total = sum(money))
## again identical to above

The latter has the advantage of being more transparent.

0
votes

You could aggregate after transforming the Month variable:

aggregate(Money ~ Month + Location, transform(data, Month = (Month - 1) %/% 3), sum)