1
votes

I'm working on an R dataframe

with columns GROUP_COL | TIME| VALUE

. Time is in order, value is numerical and group col is a categorical variable I want to group the data by. My goal is to

  • first group by the GROUP_COL variable
  • then, order by TIME
  • and then calculate a weighted mean for the values in each group using the formula value = 0.1 * previous_value + 0.9 * value for each row. If there is no previous value, leave the value as it is.
  • this weighted value should be stored in a separate column WEIGHTED.

What I tried so far is: Usng `dplyr, I created a vector of previous values using lag()

weighted_avg_with_previous <- function(.data, lag_weight=0.1) {
  # get previous values
  lag_val <- lag(.data$VALUE, n = 1L, default = 0, order_by = .data$TIME)

  # give each value a weight 0.9 for current value and 0.1 for previous value
  weighted = (1 -lag_weight) * .data$VALUE + lag_weight * lag_val
  return (weighted)
}

data <- data %>% 
  group_by(SALES_RESPONSIBILITY, PRODUCT_AREA, CURRENCY, FORECAST_TYPE) %>% 
  arrange(HORIZON, .by_group=TRUE) %>% 
  mutate(WEIGHTED_VALUE = weighted_avg_with_previous(0.1))

However, the mutate statement throws an error. How can I get my weighted_avg_with_previous functions to run on the single groups?

Example:

    GROUP | TIME| VALUE | WEIGHTED VALUE
    _____________________________________
     A    |  1  |   1   |     1
     A    |  2  |   2   |     1.9
     A    |  3  |   3   |     2.9
     A    |  4  |   4   |     3.9
     B    |  1  |   3   |     3
     B    |  2  |   7   |     6.6
     B    |  3  |   -4  |     -3.3
     ...

Best, Julia

1

1 Answers

3
votes
library(tidyverse)    
df <- structure(list(GROUP = c("A", "A", "A", "A", "B", "B", "B"),
    TIME = c(1L, 2L, 3L, 4L, 1L, 2L, 3L), VALUE = c(1L, 2L, 3L,
    4L, 3L, 7L, -4L)), row.names = c(NA, -7L),  class = c("tbl_df",
"tbl", "data.frame"))


 df %>%
      group_by(GROUP) %>%
      mutate(previous.value = lag(VALUE)) %>%
      mutate(weighted.value = ifelse(is.na(previous.value),VALUE, 0.1*previous.value + 0.9*VALUE)) %>%
      select(-previous.value)

The first mutate() statement creates a new variable for lagged value and the second one creates weighted.value which equals either 0.1*previous.value + 0.9*value, or value if previous.value is null.

Output:

# A tibble: 7 x 4
# Groups:   GROUP [2]
  GROUP  TIME VALUE weighted.value
  <chr> <int> <int>          <dbl>
1 A         1     1            1
2 A         2     2            1.9
3 A         3     3            2.9
4 A         4     4            3.9
5 B         1     3            3
6 B         2     7            6.6
7 B         3    -4           -2.9