What is the right way to multiply a named vector by a dataframe?

3

votes

Similar to this SO question, what is the right way to multiply a named vector by a dataframe, such that each row is multiplied by corresponding element of the vector?

df <- data.frame(A=1:5, B=2:6)
v <- c(2, 0)
names(v) <- c("B", "A")

I would like the following output:

None of the suggested solutions from the other question match column names with the names of the vector. For example,

dt <- data.table(df)
for (i in seq_along(dt))
    dt[, i := dt[[i]] * v[i], with = F]

dt
    A B
1:  2 0
2:  4 0
3:  6 0
4:  8 0
5: 10 0

I can do it by reordering v, but I wonder whether there's a better way to do this:

v <- v[colnames(df)]

r dataframe

3

votes

We can loop through the names using lapply, then cbind:

res <- do.call(cbind, 
               lapply(names(df), function(i){
                 df[i] * v[i]
               }))


class(res)
# [1] "data.frame"
res
#   A  B
# 1 0  4
# 2 0  6
# 3 0  8
# 4 0 10
# 5 0 12

3

votes

How about this:

r <- mapply('*', df, v[names(df)])
# or equivalently: mapply(function(x,y) x*y, df, v[names(df)])

#     A  B
#[1,] 0  4
#[2,] 0  6
#[3,] 0  8
#[4,] 0 10
#[5,] 0 12

v[names(df)] will give the vector elements in the same order as they are in df, so column-name-respective, so to say.

If you want to have r as data frame, just do as.data.frame(r).

This is from ?mapply

mapply is a multivariate version of sapply. mapply applies FUN to the first elements of each ... argument, the second elements, the third elements, and so on. Arguments are recycled if necessary.

FUN is * in our setting.

1

votes

You can do the following (transpose the data.frame, multiply by the ordered vector, and then transpose again):

  as.data.frame(t(t(df)*v[colnames(df)]))

Here are some benchmarks on a larger data frame: (f1 is @zx8754's function and f2 is @m0h3n's function)

df <- data.frame(A=1:5000, B=2:5001)
v <- c(2, 0)
names(v) <- c("B", "A")

library(microbenchmark)

f1 <- function(){
  do.call(cbind, 
          lapply(names(df), function(i){
            df[i] * v[i]
          }))
}

f2 <- function(){
  as.data.frame(mapply('*', df, v[names(df)]))
}

f3 <- function(){
  as.data.frame(t(t(df)*v[colnames(df)]))
}

microbenchmark(f1(), f2(), f3())

Unit: microseconds
 expr      min        lq      mean    median        uq      max neval cld
 f1()  594.394  663.9595  711.3634  690.8815  748.8425 1022.605   100  b 
 f2() 2428.762 2618.7460 2701.1528 2669.4355 2730.8070 3904.354   100   c
 f3()  251.776  361.7550  401.8032  381.8825  418.6225  793.604   100 a

0

votes

If you have more variables in the dataframe than elements in the vector you may want to use an extended version of @jav's answer:

library(magrittr) 
df %>% 
  select(one_of(vars)) %$% 
  as.data.frame(t(t(.)*multiplier[vars])) %>% 
  bind_cols(df %>% select(-one_of(vars)))

Alternatively you can use the map2_df function from the purrr package to do the leg work (and I am shamelessly borrowing @akrun's answer to my (as it turns out) similar question here).

library(purrr)
df %>% 
  select(one_of(vars)) %>% 
  map2_df(multiplier[vars], ~ .x * .y)  %>%
  bind_cols(df %>% select(-one_of(vars)))

If you are keen on keeping the original order of the variables, just add %>% select(one_of(names(df))) to either one.

Performance-wise these two seem to be pretty much on par:

f4 <- function(){
  df %>% 
    select(one_of(vars)) %$% 
    as.data.frame(t(t(.)*multiplier[vars])) %>% 
    bind_cols(df %>% select(-one_of(vars))) 
  }

f5 <- function(){
  df %>% 
    select(one_of(vars)) %>% 
    map2_df(multiplier[vars], ~ .x * .y)  %>%
    bind_cols(df %>% select(-one_of(vars))) 
}

microbenchmark(f4(), f5())

Unit: milliseconds
 expr      min       lq     mean   median       uq      max neval
 f4() 1.142170 1.178752 1.320680 1.197293 1.227915 2.858073   100
 f5() 1.155081 1.180077 1.248928 1.206396 1.227915 2.647517   100

What is the right way to multiply a named vector by a dataframe?

4 Answers