0
votes

I have the following snapshot) of my dataset:

df<-data.frame( c(2014, 2015, 2016, 2014, 2015, 2016, 2014, 2015, 2016), c(1,1,1,1,1,1,2,2,2), c(1,1,0,0,0,0,0,0,0), c("q1", "q1", "q1", "q2","q2","q2", "q3", "q3", "q3"))
colnames(df)<-c("year", "male.cohort", "male.work", "householdid")

(In my real dataset I have monthly data, but the idea is the same).

Using this data, I would like to plot two lines (one for male cohort equal to one and one for male cohort equal to 2) that represents the fraction of males working at each point in time (in this case 2014,2015,2016). I tried to use the following code, but it does not give me the solution I am looking for:

test <- as.data.frame(unlist(tapply(df$male.work,INDEX = df[,c("year","male.cohort")], function(x){sum(x)/length(x)})))
colnames(test) <- "frac"
test$year <- rownames(test)
ggplot()+geom_line(data = test, aes(x=year,y=frac)) 

I think I do something wrong in the using of tapply as it gives the right percentage of each cohort working in each year, but it does not represent it per cohort.

I would appreciate any help.

1

1 Answers

1
votes

Probably, you can try

library(dplyr)
library(ggplot2)

df %>%
  mutate_at(vars(year, male.cohort), factor) %>%
  group_by(year, male.cohort) %>%
  summarise(work_perc = sum(male.work)/n()) %>% 
  ggplot() + aes(year, work_perc, color = male.cohort, group = male.cohort) + 
  geom_line()