0
votes

I'm working on a project that's looking for the relationship between family income and its number of children. To make it simple, suppose I have my data like this:

df <- data.frame(children = sample(0:9, 100, replace=TRUE),
                 income = floor(rnorm(100, 30000, 10000)))

I break the income into four groups by its 1-st, median, 3-rd quantiles:

income.br <- with(df, c(-Inf, stats(income)[5], stats(income)[6],
                  stats(income)[7], Inf))

and save as a table:

x <- with(df, table(children, cut(income, breaks = income.br)))

For now I need to calculate the mean number of children in each income group. Here's what I did:

apply(x * as.numeric(levels(factor(df$children))), 2, sum) / apply(x, 2, sum)

It looks clumsy so I was thinking if there's a better way to do this (like one-way anova maybe?). Thanks!

1
The example is not reproducible. What is the stats?kohske
It's a function that calculates summary statistics in {fields} package. Similar to summary().Rock

1 Answers

2
votes

Probably this is what you want:

> with(df, tapply(children, cut(income, c(-Inf, quantile(income)[2:4], Inf)), mean))
    (-Inf,2.35e+04] (2.35e+04,2.96e+04] (2.96e+04,3.82e+04]     (3.82e+04, Inf] 
               5.32                4.40                4.36                3.84