Using dplyr to create new groups inside a column

Question

This is my dataframe:

mydf<-structure(list(DS_FAIXA_ETARIA = c("Inválido", "16 anos", "17 anos", 
"18 anos", "19 anos", "20 anos", "21 a 24 anos", "25 a 29 anos", 
"30 a 34 anos", "35 a 39 anos"), n = c(5202L, 48253L, 67401L, 
79398L, 88233L, 90738L, 149634L, 198848L, 238406L, 265509L)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

I would like to have grouped the observations into one group called: 16 a 20 anos.

"16 anos", "17 anos", 
"18 anos", "19 anos", "20 anos"

In other words I would like to "merge" the rows 2-6 and sum its observations on the n column. I would have one row represent the sum of rows 2-6.

Is it possible to do this using group_by and then summarise(sum(DS_FAIXA_ETARIA)) verbs from dplyr?

This would be the output that I want:

mydf<-structure(list(DS_FAIXA_ETARIA = c("Inválido","16 a 20 anos" ,"21 a 24 anos", "25 a 29 anos", 
                                   "30 a 34 anos", "35 a 39 anos"), n = c(5202L,374023L , 149634L, 198848L, 238406L, 265509L)), row.names = c(NA, 
                                                                                                                                                      -6L), class = c("tbl_df", "tbl", "data.frame"))

Many thanks

I was thining in some like this: mydf %>% group_by(something that would represent row1, row2-ro6, ..., last row) %>% summarise(new_n = sum(n )) . Is it clear? Do you agree with me that it is a simple summarise question? — Laura

TarJae TarJae · Accepted Answer · 2021-04-22T18:32:31

This should the job. First sum with summarize. Then add_row to the original dataframe. slice_tail and arrange

df1 <- mydf %>% 
  summarise(`16 a 20 anos`= sum(n[2:6]))

mydf %>% 
  add_row(DS_FAIXA_ETARIA=names(df1), n=df1$`16 a 20 anos`[1]) %>% 
  slice_tail(n=5) %>% 
  arrange(DS_FAIXA_ETARIA)

Output:

  DS_FAIXA_ETARIA      n
  <chr>            <int>
1 16 a 20 anos    374023
2 21 a 24 anos    149634
3 25 a 29 anos    198848
4 30 a 34 anos    238406
5 35 a 39 anos    265509

Using dplyr to create new groups inside a column

2 Answers