I'm currently working in data.table in R with the following data set:
id age_start age_end cases
1 2 2 1000
1 3 3 500
1 4 4 300
1 2 4 1800
2 2 2 8000
2 3 3 200
2 4 4 100
- In the given data set I only want values of cases where the age_start == 2 and the age_end ==4.
- In each ID where the age_start !=2 and the age_end !=4, I need to sum or aggregate the rows to create a group of age_start==2 and age_end ==4. In these cases I'd need to sum up the cases of age_start==2 & age_end==2, age_start==3 & age_end==3, as well as age_start==4 & age_end==4 into one new row of age_start==2 and age_end==4.
- After these are summed up into one row, I want to drop the rows that I used to make the new age_start==2 and age_start==4 row (i.e. the age values 2-2, 3-3, and 4-4) as they are no longer needed
Ideally the data set would look like this when I finish these steps:
id age_start age_end cases
1 2 4 1800
2 2 4 8300
Any suggestions on how to accomplish this in data.table are greatly appreciated!