I'm somewhat still new to R. I have a data frame of school absences with columns for dates, individual school ID numbers, day of the week, and counts for that day's number of absent students. Now, my frame has three categories of ID's: elementary school(200), middle school(300), and high school(500). Each school has its own dates and its individual absence counts for those dates, and all schools of each school type combine their totals for that date together.
The head of the data, with an example of totals for all elementary schools:
Count Date ID DOW
500 1/4/2011 200 1
689 1/5/2011 200 2
345 1/6/2011 200 3
100 1/7/2011 200 4
421 1/8/2011 200 5
1211 1/11/2011 200 1
And the tail end with data for one specific elementary school:
Count Date ID DOW
3 1/4/2011 225 3
3 1/5/2011 225 4
3 1/6/2011 225 5
2 1/7/2011 225 1
2 1/8/2011 225 2
2 1/11/2011 225 3
My problem is that I have several schools in my data frame that I want to remove from the time series analysis, but their individual daily counts are added up in their school type's total counts for that day. I'm trying to find a way to loop over the frame and remove those school's rows and to subtract their counts from their respective day's total count. So after the analysis, the head data would look more like:
Count Date ID DOW
497 1/4/2011 200 1
686 1/5/2011 200 2
342 1/6/2011 200 3
98 1/7/2011 200 4
419 1/8/2011 200 5
1209 1/11/2011 200 1
I've looked at using apply()
and sweep()
and dplyr
, but found nothing that combines looping and subtracting inside of the same frame.
EDIT: To clarify, this data frame is much larger then my example. It has several dozen schools, each with their own school-specific IDs and daily absence counts but roughly the same dates in session. I have roughly 90,000 rows of observations, so a process that automatically iterates and checks for each date and school ID was what I was looking for.
So, if using the original subset of data above, where the 200 ID is the total of all elementary school absences for that day, then an example of the individual school's observations would be like:
Count Date ID DOW
3 1/4/2011 225 3
8 1/5/2011 225 4
12 1/6/2011 225 5
2 1/7/2011 225 1
6 1/8/2011 225 2
7 1/11/2011 225 3
..... etc for all dates until end of calendar year for that school
6 1/4/2011 230 3
3 1/5/2011 230 4
2 1/6/2011 230 5
0 1/7/2011 230 1
11 1/8/2011 230 2
4 1/11/2011 230 3
..... etc for all dates until end of calendar year for that school, and repeat for next school school.
etc for all dates...
and the expected output as we have to rely on testing the code with the example data – akrun