0
votes

Let say I have data(df) with two columns. Part of it is given below.

  familyGroup Sex
 1    601       2
 2    601       1
 3    601       2
 4    601       1
 5    601       1
 6    601       2
 7    602       2
 8    602       1
 9    602       2
10    602       1
11    602       1
12    602       1

Under Sex column <1 represent male> and <2 represent female>. I want the sum of count of male and female for a familyGroup and report in two new columns (say under male and female)

I was using dplyr from tidyverse package Did some trial with group_by and summary and mutate

I tried the following code using dplyr. However the output is not as expected.

df %>%
  group_by(familyGroup, Sex) %>%
  summarise(male = sum(Sex==1), female = sum(Sex == 2)) %>%
  select(familyGroup, male, female)
# A tibble: 74 x 3
# Groups:   familyGroup [40]
  familyGroup  male female
   <fct>       <int>  <int>
 1     601      3      0
 2     601      0      3
 3     602      4      0
 4     602      0      2

The expected output look like the following

familyGroup   male  female
        601    3     3
        602    4     2
3

3 Answers

2
votes

It really seems like this would be a good use case for base functions.

df <- data.frame("familyGroup" = c(rep(601, 6), rep(602, 6)),
                 "Sex" = c(2,1,2,1,1,2,2,1,2,1,1,1))
df$Sex <- ifelse(df$Sex == 1, "Male", "Female")
table(df$familyGroup, df$Sex)

      Female Male
  601      3    3
  602      2    4

If you want a more complex table either ftable() or xtabs() are useful.

2
votes

You can do this with dcast from data.table package like this:

library(data.table)
library(dplyr)
dt <- read.table(text = " familyGroup Sex
 1    601       2
 2    601       1
 3    601       2
 4    601       1
 5    601       1
 6    601       2
 7    602       2
 8    602       1
 9    602       2
10    602       1
11    602       1
12    602       1")

dt <- data.table(dt)

dt[, 
    sexLabel := dplyr::case_when(
      Sex == 1 ~ "male",
      Sex == 2 ~ "female",
      TRUE ~ "(unknown)"
    )
]

res <- dcast(
  formula = familyGroup ~ sexLabel,  
  data = dt, 
  fun.aggregate = length,
  value.var = "Sex"
)
res
#       familyGroup female male
# 1:         601      3    3
# 2:         602      2    4
1
votes

We can count , familyGroup and Sex value. Change 1 to "male" and 2 to "female" and spread it to wide format.

library(dplyr)

df %>% 
  count(familyGroup, Sex) %>%
  mutate(Sex = ifelse(Sex == 1, "male", "female")) %>%
  tidyr::spread(Sex, n)

#  familyGroup female  male
#        <int>  <int> <int>
#1         601      3     3
#2         602      2     4