I am trying to summarize groups in a large data frame using a function which works when nrow length of a group = 3, but produces NA's for all groups in a given column when the function is applied to any groups that have an nrow length = 1 or 2.
Below is a snapshot of the dataset I am working with. This data frame is the output of the group_by function which I have used to group samples by zones (0's or 1's) within an ID class 'ID'. The resultant groups are in the 'grp' column. For this dataset I am only interested in groups which have a Zone value of 1 (i.e. Groups 2, 4 and 7). Groups with Zone values of 1 are either 1 or 3 rows long.
> df
ID From To Count Mean_Value Zone
1 ID1 0 48 15 0.04927333 0
2 ID1 0 48 18 0.06273333 0
3 ID1 45 48 3 1.02333333 1
4 ID1 48 49 1 0.10300000 1
5 ID1 49 51 2 0.71500000 1
6 ID1 51 58 7 0.10285714 0
7 ID1 58 59 1 0.55000000 1
8 ID1 59 62 3 0.26333333 0
9 ID2 0 36 9 0.05211111 0
10 ID2 33 34 1 0.70000000 1
11 ID2 34 50 9 0.11666667 0
> grp_test <- group_by(df, grp = data.table::rleid(ID, Zone))
> grp_test
# Groups: grp [8]
ID From To Count Mean_Value Zone grp
<fct> <int> <int> <int> <dbl> <int> <int>
1 ID1 0 48 15 0.0493 0 1
2 ID1 0 48 18 0.0627 0 1
3 ID1 45 48 3 1.02 1 2
4 ID1 48 52 1 0.103 1 2
5 ID1 49 51 2 0.715 1 2
6 ID1 51 58 7 0.103 0 3
7 ID1 58 59 1 0.55 1 4
8 ID1 59 62 3 0.263 0 5
9 ID2 0 36 9 0.0521 0 6
10 ID2 33 34 1 0.7 1 7
11 ID2 34 50 9 0.117 0 8
I am now trying to summarize the groups to obtain the mean of each group (grp) with a Zone value of 1 by taking into account the 'Count' and 'Mean_Value' for each row, when computing the mean for a group which has 3 rows. I cannot simply use the 'mean' function as the mean value in each row could have varying counts. In the example above the first row from grp 2 comprises 3 counts with a mean of 1.02, the second row from grp 67 has a count of 1 with a mean value of 0.103 and the third row has a count of 2 and mean value of 0.715. The mean value for this group should be 0.767167. Below is the code used to calculate the mean when a group has length nrow = 3. The problem is that this function only works on groups with nrow length 3, and provides NA values for the entire column when there are any groups which have nrow lengths of 1 or 2. How can I summarize groups with nrow length = 3 as well as groups with nrow lengths of 1 or 2?
df_summarise <- summarise(grp_test, ID = first(ID),
From = first(From),
To = last(To),
Number_of_Intervals = n(),
Average = (((first(Count) * first(Mean_Value)) + (nth(Count, 2) * nth(Mean_Value, 2)) + (last(Count) * last(Mean_Value))) / (sum(Count))),
Zone = first(Zone))
The expected outcome is below
ID From To Count Mean_Value Zone grp
1 ID1 45 51 6 0.7671667 1 2
2 ID1 58 59 1 0.5500000 1 4
3 ID2 33 34 1 0.7000000 1 7