0
votes

Does anyone know if it is possible to calculate a weighted mean in R when values are missing, and when values are missing, the weights for the existing values are scaled upward proportionately?

To convey this clearly, I created a hypothetical scenario. This describes the root of the question, where the scalar needs to be adjusted for each row, depending on which values are missing.

Image: Weighted Mean Calculation

File: Weighted Mean Calculation in Excel

2
It's definitely possible to do in R. Try having a go yourself and posting some example code here where you run into problems. - Scransom
Thanks qqq. There are many similar samples of code in related questions, link, but it seems like most want to mutate, or replace with the mean, or replace with zero, when there is an N/A. Without being a burden and asking the same question, I thought it might be easier to show the explicit difference with my case, where I want to re-scale the remaining variables. I hadn't seen that elsewhere. And it might just be an obvious, short answer, by using na.rm. - milaske

2 Answers

1
votes

Using weighted.mean from the base stats package with the argument na.rm = TRUE should get you the result you need. Here is a tidyverse way this could be done:

library(tidyverse)
scores <- tribble(
 ~student, ~test1, ~test2, ~test3,
   "Mark",     90,     91,     92,
   "Mike",     NA,     79,     98,
   "Nick",     81,     NA,     83)

weights <- tribble(
  ~test,   ~weight, 
  "test1",     0.2, 
  "test2",     0.4,
  "test3",     0.4)

scores %>% 
  gather(test, score, -student) %>%
  left_join(weights, by = "test") %>%
  group_by(student) %>%
  summarise(result = weighted.mean(score, weight, na.rm = TRUE))
#> # A tibble: 3 x 2
#>   student   result
#>     <chr>    <dbl>
#> 1    Mark 91.20000
#> 2    Mike 88.50000
#> 3    Nick 82.33333
0
votes

The best way to post an example dataset is to use dput(head(dat, 20)), where dat is the name of a dataset. Graphic images are a really bad choice for that.
DATA.

dat <-
structure(list(Test1 = c(90, NA, 81), Test2 = c(91, 79, NA), 
    Test3 = c(92, 98, 83)), .Names = c("Test1", "Test2", "Test3"
), row.names = c("Mark", "Mike", "Nick"), class = "data.frame")

w <-
structure(list(Test1 = c(18, NA, 27), Test2 = c(36.4, 39.5, NA
), Test3 = c(36.8, 49, 55.3)), .Names = c("Test1", "Test2", "Test3"
), row.names = c("Mark", "Mike", "Nick"), class = "data.frame")

CODE.
You can use function weighted.mean in base package statsand sapply for this. Note that if your datasets of notes and weights are R objects of class matrix you will not need unlist.

sapply(seq_len(nrow(dat)), function(i){
    weighted.mean(unlist(dat[i,]), unlist(w[i, ]), na.rm = TRUE)
})