I'm working on an R dataframe
with columns
GROUP_COL | TIME| VALUE
. Time is in order, value is numerical and group col is a categorical variable I want to group the data by. My goal is to
- first group by the
GROUP_COL
variable - then, order by
TIME
- and then calculate a weighted mean for the values in each group using the formula
value = 0.1 * previous_value + 0.9 * value
for each row. If there is no previous value, leave the value as it is. - this weighted value should be stored in a separate column
WEIGHTED
.
What I tried so far is: Usng `dplyr, I created a vector of previous values using lag()
weighted_avg_with_previous <- function(.data, lag_weight=0.1) {
# get previous values
lag_val <- lag(.data$VALUE, n = 1L, default = 0, order_by = .data$TIME)
# give each value a weight 0.9 for current value and 0.1 for previous value
weighted = (1 -lag_weight) * .data$VALUE + lag_weight * lag_val
return (weighted)
}
data <- data %>%
group_by(SALES_RESPONSIBILITY, PRODUCT_AREA, CURRENCY, FORECAST_TYPE) %>%
arrange(HORIZON, .by_group=TRUE) %>%
mutate(WEIGHTED_VALUE = weighted_avg_with_previous(0.1))
However, the mutate
statement throws an error. How can I get my weighted_avg_with_previous
functions to run on the single groups?
Example:
GROUP | TIME| VALUE | WEIGHTED VALUE
_____________________________________
A | 1 | 1 | 1
A | 2 | 2 | 1.9
A | 3 | 3 | 2.9
A | 4 | 4 | 3.9
B | 1 | 3 | 3
B | 2 | 7 | 6.6
B | 3 | -4 | -3.3
...
Best, Julia