Problem: How to generate a new dataset from an existing one, basically it is a reshape from long to wide, but a bit more complicated.
I have a non-trivial amount of data, of which I offer a simplified version below:
id <- c(1,2,3,4,5)
job <- c(11,12,11,12,13)
sex <- c(0,1,0,1,0)
country <- c(1,2,3,2,1)
data <- data.frame(id, job, sex, country)
Desired data: I'd like to have a dataset of the jobs and their occupants, like this: in job=11, I have 2 people of sex==0 and 1 born in country==1 and 1 born in country==3
So, the new dataset would be like this:
jobs jobs_sex0 jobs_sex1 jobs_country1 jobs_country2 jobs_country3
1 11 2 0 1 0 0
2 12 0 2 0 2 0
3 13 1 0 0 0 1
I have an intuition that this can be achieved with tapply, but I am not sure how.
I have tried this, and it does not work:
tapply(occupation[sex==1],sex[sex==1], sum)
aggregate(occupation, list(sex), fun=sum)
Edit: I think this Q is not a duplicate of Transpose / reshape dataframe without "timevar" from long to wide format, as the problem I have is that I need to reshape different factor variables with different number of levels... Applying the answer from the supposedly duplicated Q does not work...