Here my example data.frame:
df = read.table(text = 'ID Day Count Count_group
18 1933 6 15
33 1933 6 15
37 1933 6 15
18 1933 6 15
16 1933 6 15
11 1933 6 15
111 1932 5 9
34 1932 5 9
60 1932 5 9
88 1932 5 9
18 1932 5 9
33 1931 3 4
13 1931 3 4
56 1931 3 4
23 1930 1 1
6 1800 6 12
37 1800 6 12
98 1800 6 12
52 1800 6 12
18 1800 6 12
76 1800 6 12
55 1799 4 6
6 1799 4 6
52 1799 4 6
133 1799 4 6
112 1798 2 2
677 1798 2 2
778 888 4 8
111 888 4 8
88 888 4 8
10 888 4 8
37 887 2 4
26 887 2 4
8 886 1 2
56 885 1 1
22 120 2 6
34 120 2 6
88 119 1 6
99 118 2 5
12 118 2 5
90 117 1 3
22 115 2 2
99 115 2 2', header = TRUE)
Count
column shows the number of ID
observations within a Day
;
Count_group
shows the number of ID
observations within a Day
and its previous 4 days.
I need to expand df
in order to have all the days within each Count_group
episodes.
Expected output:
ID Day Count Count_group
18 1933 6 15
33 1933 6 15
37 1933 6 15
18 1933 6 15
16 1933 6 15
11 1933 6 15
111 1932 5 15
34 1932 5 15
60 1932 5 15
88 1932 5 15
18 1932 5 15
33 1931 3 15
13 1931 3 15
56 1931 3 15
23 1930 1 15
6 1800 6 12
37 1800 6 12
98 1800 6 12
52 1800 6 12
18 1800 6 12
76 1800 6 12
55 1799 4 12
6 1799 4 12
52 1799 4 12
133 1799 4 12
112 1798 2 12
677 1798 2 12
111 1932 5 9
34 1932 5 9
60 1932 5 9
88 1932 5 9
18 1932 5 9
33 1931 3 9
13 1931 3 9
56 1931 3 9
23 1930 1 9
778 888 4 8
111 888 4 8
88 888 4 8
10 888 4 8
37 887 2 8
26 887 2 8
8 886 1 8
56 885 1 8
55 1799 4 6
6 1799 4 6
52 1799 4 6
133 1799 4 6
112 1798 2 6
677 1798 2 6
22 120 2 6
34 120 2 6
88 119 1 6
88 119 1 6
99 118 2 6
12 118 2 6
99 118 2 6
12 118 2 6
90 117 1 6
90 117 1 6
22 115 2 6
99 115 2 6
99 118 2 5
12 118 2 5
90 117 1 5
22 115 2 5
99 115 2 5
33 1931 3 4
13 1931 3 4
56 1931 3 4
23 1930 1 4
37 887 2 4
26 887 2 4
8 886 1 4
56 885 1 4
90 117 1 3
22 115 2 3
99 115 2 3
112 1798 2 2
677 1798 2 2
8 886 1 2
56 885 1 2
22 115 2 2
99 115 2 2
23 1930 1 1
56 885 1 1
Explanation of output:
1) Day 1933 has got 6 IDs on this precise day (Count col) and 15 IDs in total from Day 1933 to Day 1929 (Count_group col). The value 15 comes from 6 (Day 1933) + 5 (1932) + 3 (1931) + 1 (1930) + 0 (1929). So in the output I added all the remaining days within Count_group = 15 episode.
2) Next Day in descending order is 1932. With 5 IDs on this precise day and 9 IDs in total from Day 1932 to Day 1928. The value 9 comes from 5 (1932) + 3 (1931) + 1 (1930) + 0 (1929) + 0 (1928). And in the output (row 28) you will see the Day 1932 complete (5-day) episode with 9 rows in total.
3) Next Day is 1931..etc, etc..
The output data.frame is ranked by Count_group and Day, both decreasing = TRUE.
I am trying to create a code that works not only for a 5-day window (as above) but for any time window of n days.
Do you have any suggestion?
Thanks
tidyr::complete()
. Maybe see this question, or this one. – austensen